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13.  abstract 


‘‘What  does  it  mean  to  compute  an  eigendecomposition  of  an  uncertain  matrix? 
Because  of  measurement  errors  and  roundoff  errors,  one  must  typically  compute  the 
eigenvalues  and  eigenvectors  not  of  a  single  matrix  but  rather  of  a  ball  of  matrices 
whose  radius  depends  on  the  uncertainty  in  the  data.  We  approach  this  problem  by 
asking  how  to  partition  the  eigenvalues  of  the  matrices  in  the  ball  into  nonover¬ 
lapping  groups  which  cannot  themselves  be  further  partitioned.  More  specifically,  we 

...  j.-r,  '!»<-  I  5 

define  the  dli-iociation  of  two  subsets  Oj  and  cr2=&  a i  of  the  sets  of  eigenvalues 

, .j*. ..  \  i ■  •  1 

a  of  a  matrix  T  as  the  smallest  perturbation  of  T  that  will  make  some  eigenvalue 

.,,>A  3 

from  a i  and  some  eigenvalue  from  oz  move  together  and  become  indistinguishable. 

If  T  is  the  center  of  the  ball  of  matrices,  and  the  dissociation  of  at  and  a2  is 

(continued  on  back) 


greater  than  the  radius  of  the  ball,  then  cx  and  a2  are  nonover¬ 
lapping  groups  of  eigenvalues;  otherwise  the  dissociation  is  less  than 
or  equal  to  the  radius  and  oi  and  are  not  distinguishable  groups. 

By  computing  the  dissociation  for  various  c*i  and  cs2,  we  may  compute 
our  desired  partition  of  a. 

j  The  results  of  this  thesis  are  of  two  kinds.  First,  we  compute 
upper  and  lower  bounds  on  the  dissociation  which  improve  bounds  in  the 
literature.  Both  upper  and  lower  bounds  are  achievable  or  nearly  so. 

The  upper  and  lower  bounds  are  often  close  together  but  occasionally 
far  apart.  Our  second  set  of  results  quantifies  this  last  statement 
by  assuming  a  probability  density  on  the  set  of  matrices  and  computing 
the  likelihood  that  the  bounds  are  far  apart.  This  approach  leads  to 
numerous  other  probabilistic  results,  such  as  the  distribution  of  the 
condition  number  of  a  random  matrix,  and  the  distribution  of  the  distance 


from  a  random  matrix  to  one  with  a  given  Jordan  form.  We  discuss  the 


relevance  of  this  probabilistic  model  to  finite  pre 


calculations . 
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A  Numerical  Analyst's  Jordan  Canonical  Fbrm 

James  Weldon  Demmel 

ABSTRACT 

What  does  it  mean  to  compute  an  eigendecomposition  of  an  uncertain 
matrix?  Because  of  measurement  errors  and  roundoff  errors,  one  must  typi¬ 
cally  compute  the  eigenvalues  and  eigenvectors  not  of  a  single  matrix  but 
rather  of  a  ball  of  matrices  whose  radius  depends  on  the  uncertainty  in  the 
data.  We  approach  this  problem  by  asking  how  to  partition  the  eigenvalues  of 
the  matrices  in  the  ball  into  nonoverlapping  groups  which  cannot  themselves 
be  further  partitioned.  More  specifically,  we  define  the  dissociation  of  two 
subsets  (7{  and  a2-a  \  oJ  of  the  set  of  eigenvalues  a  of  a  matrix  T  as  the 
smallest  perturbation  of  T  that  will  make  some  eigenvalue  from  oy  and  some 
eigenvalue  from  <j2  move  together  and  become  indistinguishable.  If  7  is  the 
center  of  the  ball  of  matrices,  and  the  dissociation  of  <J1  and  az  is  greater 
than  the  radius  of  the  ball,  then  9j  and  Og  are  nonoverlapping  groups  of 
eigenvalues;  otherwise  the  dissociation  is  less  than  or  equal  to  the  radius  and 
Cj  and  <Tg  are  not  distinguishable  groups.  By  computing  the  dissociation  for 
various  (J\  and  eg.  we  may  compute  our  desired  partition  of  o. 

The  results  of  this  thesis  are  of  two  kinds.  First,  we  compute  upper  and 
lower  bounds  on  the  dissociation  which  improve  bounds  in  the  literature. 
Both  upper  and  lower  bounds  are  achievable  or  nearly  so.  The  upper  and 
lower  bounds  are  often  close  together  but  occasionally  far  apart.  Our  second 
set  of  results  quantifies  this  last  statement  by  assuming  a  probability  density 
on  the  set  of  matrices  and  computing  the  likelihood  that  the  bounds  are  far 
apart  This  approach  leads  to  numerous  other  probabilistic  results,  such  as 
the  distribution  of  the  condition  number  of  a  random  matrix,  and  the  distri- 


2 

button  of  tbo  distance  from  a  random  matrix  to  ana  with  a  given  Jordan  form. 
Wo  discuss  tbo  relevance  of  this  probabilistic  modal  to  finite  precisian  ealcu* 
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Chapter  1:  Introduction 

Given  a  complex  n  by  n  matrix  Tq  known  only  to  within  a  tolerance  e>0. 
what  does  it  mean  to  compute  an  eigendecomposition  of  To?  By  knowing  To 
only  to  a  tolerance  t  we  mean  that  Tq  is  indistinguishable  from  any  matrix  in 
the  set 

T(t)*\T:\\T0-T\\  <e{  .  (1.1) 

(||  T  ||  denotes  some  norm  of  the  matrix  7*;  we  will  specify  the  norm  later.) 

We  would  like  to  produce  an  eigendecomposition  which  is  valid  in  some  way 
for  all  matrices  in  T(e).  and  gives  as  much  information  as  possible  about  all 
matrices  in  T(e).  This  seemingly  simple  goal  leads  us  along  several  interest¬ 
ing  paths  which  will  now  be  explored. 

First  some  notation.  An  eigendecomposition  of  a  matrix  T  will  be  writ¬ 
ten 

T  =  S9S~'  (1.2) 

where  9  is  a  block  diagonal  matrix  9  =  diag(9{ . 06).  T's  spectrum  will 

be  denoted  by  o(T)  or  merely  a  if  T  is  clear  from  context,  and  9/s  spectrum 
by  a<  for  short.  Thus  <y  =  If  <rt  contains  m  eigenvalues  (counting  multi- 

i 

plicities)  we  write  #(cri)sm. 

The  following  sequence  of  examples  will  illustrate  the  difficulties  encoun¬ 
tered  in  computing  an  eigendecomposition  of  T(e)  for  different  values  of  e. 
Consider  the  following  matrix,  which  is  essentially  in  Jordan  canonical  form: 


2 


7o  = 


1.001 


100 

0 


-1 


(1.3) 


(blanks  and  0  both  denote  zero  entries).  This  decomposition  tells  us  several 
things:  that  T0  has  4  distinct  eigenvalues  at  1.001.  1,  0,  and  -1.  that  each 
nonzero  eigenvalue  has  a  one  dimensional  invariant  subspace  (i.e.  an  eigen¬ 
vector)  associated  with  it,  and  that  associated  with  0  is  one  two  dimensional 
and  one  one  dimensional  invariant  subspace. 


Does  this  information  remains  valid  for  all  matrices  in  T(c)  as  e 
increases  from  0?  As  soon  as  e  becomes  nonzero,  it  is  no  longer  true  that  all 
matrices  in  T(c)  have  a  triple  eigenvalue  at  0,  nor  two  invariant  subspaces 
associated  with  eigenvalues  near  0.  For  example, 


T  i 


1 

1.001 

0  100 
e  0 

0 

-1 


has  3  simple  eigenvalues  at  0,  10Ve,  and  —  10Ve  each  with  its  own  eigenvec¬ 
tor,  and 


T.= 


1.001 


100 

0 


-1 


has  one  triple  eigenvalue  at  0  with  just  one  three  dimensional  invariant  sub- 
space  associated  with  it. 


Thus,  all  matrices  in  7(e)  (for  e  small  enough)  have  three  eigenvalues 
near  to  0  which  together  have  a  three  dimensional  invariant  subspace  associ- 


ated  with  them.  We  cannot,  however,  identify  them  individually  because  they 
could  all  simultaneously  equal  0  (in  the  case  of  Tq)',  their  only  identities  are 
as  members  of  a  cluster  of  three. 

As  e  increases  to  .0005,  matrices  occur  in  T(e)  which  no  longer  have  two 
simple  eigenvalues  around  1: 

1.0005  r} 

1.0005 

0  100 
0 

0 

-1 

Ta  has  two  eigenvalues  at  1.0005  associated  with  a  two  dimensional  invariant 
subspace  and  for  rj*0  but  arbitrarily  small  this  subspace  cannot  be  split  into 
two  one  dimensional  subspaces.  Thus,  when  c  exceeds  .0005  (but  not  .005), 
T(e)  has  one  three  dimensional  invariant  subspace  with  three  eigenvalues 
indistinguishable  from  0,  one  two  dimensional  subspace  with  two  eigenvalues 
indistinguishable  from  1.0005,  and  one  simple  eigenvalue  at  *1. 

In  particular,  one  may  draw  three  disjoint  simple  closed  curves  (Jordan 
curves)  in  the  complex  plane,  one  around  0,  one  around  1,  and  one  around  -1, 
such  that  any  TcTCOOOS)  will  have  three  eigenvalues  clustered  strictly  inside 
the  first  curve,  two  inside  the  second,  and  one  inside  the  third.  Furthermore, 
it  is  impossible  to  draw  any  larger  number  of  such  curves  such  that  each  one 
will  strictly  contain  a  fixed  number  of  eigenvalues  of  each  T€ T(.0005).  This 
last  statement  is  true  because  within  T(.0005)  there  is  a  matrix  (T$)  with  a 
double  eigenvalue  at  1.0005  and  a  triple  eigenvalue  at  0. 

For  values  of  e  exceeding  .005,  say  .01,  the  clustering  of  the  eigenvalues 
changes  again.  The  matrix 
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7*4 


1 

1.001 

0  100 

.01  0 

0 

-1 


has  simple  eigenvalues  at  1.001  and  0,  and  double  eigenvalues  at  1  and  -1. 
Looking  at  the  eigenvalues  as  functions  of  the  entry  containing  .01  (the  3.4 
entry),  that  f4  has  a  pair  of  eigenvalues  at  tlOVT'*  a, 4  =  ±1  when  T4sa  -  .01. 
Thus,  no  Jordan  curve  can  be  drawn  which  separates  the  eigenvalues  into  dis¬ 
joint  regions  as  was  done  in  the  case  of  T(.0005)  or  for  T(e)  with  smaller  e. 
This  is  because  the  eigenvalues  "near  0"  can  no  longer  be  separated  from  the 
eigenvalues  near  -1  nor  1,  and  neither  can  the  eigenvalue  at  1.001  be 
separated  from  1.  Thus,  one  Jordan  curve  must  be  drawn  containing  all  the 
eigenvalues. 

In  the  case  of  T(.0005)  we  could  find  a  matrix  (Ts)  with  a  single  multiple 
eigenvalue  within  the  region  bounded  by  each  Jordan  curve.  It  seems  natural 
to  think  of  all  matrices  in  T(.000S)  as  being  small  perturbations  of  one  with  a 
double  eigenvalues  at  1.0005,  a  triple  eigenvalue  at  0,  and  a  simple  eigen¬ 
value  at  -1  (Ta).  The  existence  of  Ts  also  provides  a  simple  explanation  for 
not  being  able  to  distinguish  the  three  eigenvalues  near  0  or  the  two  near  1. 
Is  it  possible  to  find  a  matrix  with  a  sextuple  eigenvalue  in  T(.0l)?  More  gen- 
erally.  given  a  T(e)  and  a  clustering  of  eigenvalues  which  can  not  be  refined 
by  drawing  more  separating  Jordan  curves,  is  it  possible  to  find  T's  in  T(e) 
which  have  single  eigenvalues  in  place  of  each  cluster?  The  answer  is  no.  We 
and  independently  Wilkinson  have  produced  examples  such  as  [Wilkinson4] 
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V  1 
2  rt  1 

r°  =  3r?  1 

Ar) 

where  for  e>rj4  (tj«1)  one  Jordan  curve  around  the  entire  spectrum  of  T(e) 
must  be  drawn,  but  where  e  must  exceed  something  of  order  before  a 

matrix  with  a  single  quadruple  eigenvalue  can  be  found  in  T(e).  We  call  the 
eigenproblem  for  T (c‘)  (tj 4<e’<qa)  ill  posed,  because  while  no  nonempty 
proper  subset  of  the  eigenvalues  is  distinguishable  (by  being  separable  by  a 
Jordan  curve  from  the  remaining  eigenvalues),  matrices  in  T(c )  cannot  be 
thought  of  as  perturbations  of  some  particular  matrix  in  T (e)  with  a  single 
quadruple  eigenvalue.  The  problem  of  locating  the  nearest  matrix  with  just 
one  eigenvalue  is  called  the  "nearest  completely  defective  matrix  problem." 

The  central  problem  in  this  sequence  of  examples  has  been  how  to  clus¬ 
ter  the  eigenvalues  into  distinguishable  groups,  how  to  name  the  eigenvalues. 
There  are  at  least  two  notions  of  clustering  for  eigenvalues.  So  far  we  have 
sought  a  collection  of  Jordan  curves  such  that  the  region  bounded  by 
each  /<  contains  the  same  number  of  eigenvalues  (counting  multiplicities)  of 
each  TeT(e).  This  number  of  eigenvalues  will  be  called  the  content  of  Jt. 
The  easiest  way  to  see  how  these  curves  depend  on  Tq  and  e  is  to  consider 
the  set  o(T(e))  of  all  eigenvalues  of  all  Te T(e).  o(T(s))  is  an  open  set  and  can 
be  written  as  the  disjoint  union  of  its  connected  components  Oi( T(e)).  Around 
each  Oi(T(e))  one  can  draw  a  Jordan  curve  with  o<(T(£))  strictly  inside 
and  all  other  cy( T(e))  strictly  outside.  These  Jordan  curves  cluster  the  eigen¬ 
values  of  To  into  regions  in  a  way  that  also  clusters  the  eigenvalues  of  each 
Te T(e).  This  notion  of  naming  an  eigenvalue  by  the  component  of  o(T(t))  in 
which  it  lies  will  be  called  region  clustering. 
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There  is  another  useful  notion  of  clustering  or  naming.  It  will  be 
described  briefly  here,  with  the  formal  definition  left  to  the  next  chapter. 
Let  \  be  an  eigenvalue  of  T0.  and  let  T(x)  be  a  continuous  path  starting  at 
T(0)-Ta  and  remaining  in  T(e)  for  all  x>0.  Think  of  as  a  function  of  x. 
varying  continuously  as  a  function  of  x .  As  long  as 


X«(x)*Xj(x)  for  all  x  and  (•) 

\(x)  can  be  unambiguously  identified  with  X*.  If  (•)  is  true  for  all  paths  T(x) 

in  T(e),  then  X*  represents  a  cluster  (of  content  1)  for  all  matrices  in  T(e). 

This  definition  makes  sense  because  as  long  as  Xi(x)  never  equals  Xj(x).  it 

can  be  identified  by  naming  it  by  the  X*  and  the  path  T[x)  whence  it  came. 

If,  on  the  other  hand,  there  is  a  path  T(x)  in  T(e)  and  an  x0  such  that 

X1(x0)=XJ  (x0),  then  we  put  \  and  X;  into  the  same  cluster.  In  this  way,  a 

unique  clustering  of  a  is  constructed.  This  clustering  method  will  be  called 

path  clustering.  It  will  be  shown  in  the  next  chapter  that  given  T(e).  this 

path  clustering  always  produces  at  least  as  refined  a  clustering  as  does 

region  clustering. 


For  numerical  reasons  to  be  discussed  in  a  moment,  one  may  add 
another  constraint  to  the  clustering  of  eigenvalues.  Consider 


7o(x)  = 


—X  1 

1  1/  2x 

-X 

1  -1/Zx 

X 

* 

1 

X 

1 

=  S(x)  9<x)  5(x)-> 


As  long  as  e  in  T(e)  is  less  than  e(x)=(>/4xz+ 1  -1)/  2»xz  for  tiny  x,  the  two 
eigenvalues  must  remain  simple.  However,  as  e  gets  close  to  e(x),  not  only  do 
the  two  eigenvalues  get  close,  but  the  similarity  transformation  S(x)  which 
exhibits  the  decomposition  in  the  last  equation  gets  more  and  more  ill- 
conditioned.  That  is,  as  e-»e(x),  ||  5(x)||  •  ||  S(x)"l|l  The  ill-condition  of 
S(x)  is  numerically  significant  because  it  means  computing  S(z)AS(z)~l  in 
floating  point  arithmetic  is  apt  to  lead  to  large  errors  (this  phenomenon  will 
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be  discussed  more  in  Chap.  3).  Thus,  one  may  add  the  constraint  to  a  cluster¬ 
ing  that  the  matrix  5  which  exhibits  the  decomposition  must  have  a  condi¬ 
tion  number  less  than  some  tolerance  Tc: 

*(S)  »  ||S[|  •  ||S-*||  <  K  . 

At  this  point  the  reader  might  object  to  example  (1.3)  on  the  grounds 
that  the  2  by  2  block 

0  100 
0 

is  "obviously"  separate  from  the  blocks  containing  1,  1.001,  and  -1,  because 
the  oft  diagonal  zeroes  are  "obviously"  sacred.  We  can  quantify  this  intuition 
by  using  only  the  condition  number  of  the  best  conditioned  diagonalizing 
similarity  x(S)  which  displays  the  eigendecomposition  as  in  (1.2):  if  k(S)<Jc. 
then  the  decomposition  is  acceptable,  otherwise  it  is  not.  In  the  case  of 
(1.3),  which  is  already  diagonalized  as  much  as  possible,  we  may  take  5=/  so 
k(S)=1,  the  smallest  possible  value  of  k(S)  for  any  S.  This  criterion,  which 
generally  allows  a  finer  clustering  than  the  scheme  in  (l)-(3),  can  be  used  to 
decompose  matrices  in  preparation  for  computing  functions  of  them,  such  as 
the  exponential.  (This  type  of  decomposition  will  be  discussed  further  in 
Chapter  3.) 

Let  us  review  the  discussion  so  far  by  describing  a  program  to  compute 
the  eigendecomposition  of  an  uncertain  matrix 

(1)  Given  To  and  e,  we  must  cluster  the  spectrum  of  the  matrix  into  groups. 
As  stated  above,  there  are  two  possible  criteria  for  performing  the  clus¬ 
tering.  Whichever  one  is  chosen,  it  will  turn  out  that  we  need  only  con¬ 
sider  clustering  ff(r0)=o1ijOj  into  two  disjoint  pieces.  Given  such  a  clus¬ 
tering  we  must  be  able  to  compute  the  largest  1  such  that 
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Region  Clustering:  there  is  a  Jordan  curve  or  curves  J  dividing 
the  complex  plane  into  two  regions  such  that  the  groups  9t  and 
og  remain  on  opposite  sides  of  J  for  all  TeT*.  or 

Path  Clustering:  for  all  paths  T(x)  in  Tf,  for  all 

and  for  all  x. 

This  largest  f  will  be  called  the  (region  or  path)  dissociation  between  ox  and 
a*  and  denoted  by  diss^Oj.Oj,  T0, region)  or  diss£(ffi.92,7'0,pafh)  (or 
dissjr(ai.a2)  if  both  the  choice  between  ’’region"  or  "path"  and  f0  are  clear 
from  context).  The  subscript  E  indicates  that  perturbations  are  measured 
using  the  Euclidean  norm.  We  will  also  consider  diss2(or1  ,  <rz),  where  perturba¬ 
tions  will  be  measured  using  the  2-norm  (these  norms  are  defined  in  the  next 
chapter).  A  better  known  synonym  for  dissociation  is  separation,  but  separa¬ 
tion  has  already  been  used  for  related  quantities  [Stewart.  Varah]  which  will 
be  considered  in  the  next  chapter. 

(2)  Given  T&  t  and  a  clustering  o  -  (jo*.  b°w  ill-conditioned  must  Sr  be  if  it 

i 

exhibits  the  eigendecomposition 

T  =  SrdiagOi,  .  •  -  ,&b)Sf' 

where  TeT(e)  and  <r(04)  is  identified  with  at?  If  it  must  be  too  ill- 
conditioned,  then  we  need  to  combine  the  a4  from  step  (l)  into  larger 
groups. 

(3)  Given  a  cluster  a4  which  contains  more  than  one  distinct  eigenvalue  and 
cannot  be  split,  is  there  a  feT(c)  all  of  whose  eigenvalues  within  this 
group  are  equal?  If  so,  this  matrix  (or  at  least  its  existence)  should  be 
reported  to  the  user  as  output;  if  such  a  7'cT(e)  does  not  exist,  the  user 
should  be  told  that  this  part  of  his  problem  is  ill-posed.  This  problem  is 
addressed  by  Ruhe  and  Kfigstrdm  [Ruhe2,  KfigstrtJml],  and  we  will  not 
pursue  it  in  this  thesis. 


9 


Now  we  describe  the  contributions  of  this  thesis  to  the  solution  of  this 
problem.  We  were  not,  alas,  able  to  solve  the  problem  completely,  but  we 
have  made  substantial  progress  and  our  results  are  applicable  to  other  prob¬ 
lems  as  well. 

The  results  are  of  two  kinds.  Chapters  2  through  5  analyze  the  dissocia¬ 
tion  and  compute  both  upper  and  lower  bounds  for  path  and  region  dissocia¬ 
tion.  Both  upper  and  lower  bounds  are  attainable  or  nearly  so  for  various 
classes  of  matrices.  The  upper  and  lower  bounds  are  usually  close,  but  can 
be  very  far  apart.  Chapters  8  through  8  take  a  probabilistic  approach  to 
analyze  how  likely  the  bounds  are  to  be  close  or  far  apart,  and  show,  for 
example,  how  to  compute  the  probability  that  all  the  matrices  in  T(c)  will  be 
completely  diagonalizable.  Chapter  9  examines  the  applicability  of  the  proba¬ 
bilistic  model  to  finite  precision  calculations.  More  specifically,  the  results 
are  as  follows. 

Chapter  2  further  discusses  the  two  notions  of  clustering  (region  and 
path)  described  above.  In  particular  we  show 

diss(o, ,  eg  ,  path)  as  diss(oi ,  <rz  ,  rtgian)  , 
and  that  we  can  choose  a  matrix  norm  that  makes  the  two  dissociation  meas¬ 
ures  unequal.  We  also  define  the  simple  dissociation  measures  which  will  be 
combined  to  produce  bounds  on  diss(ei ,  eg  ,  rtgian)  and  diss(et ,  eg  ,  path). 
We  derive  basic  properties  of  these  measures,  in  particular  bow  they  behave 
under  similarity  transforms  of  the  matrix,  and  a  "divide  and  conquer"  pro¬ 
perty  that  makes  them  easier  to  compute  when  the  matrix  has  a  Dlock  diago¬ 
nal  structure.  We  also  present  an  upper  bound  on  dissg(<rt ,  og ,  path)  and 
diss^(ai  ,  ffg  ,  path)  based  on  one  of  these  measures. 
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Chapter  3  solves  the  problem  in  step  (2)  above  by  computing  an  Sf 
whose  condition  number  is  within  a  factor  of  VF  of  best  possible  (b  is  the 
number  of  partitions),  and  by  computing  explicit  upper  and  lower  bounds  on 
the  best  possible  condition  number  which  differ  by  at  most  a  factor  of  b .  We 
also  discuss  the  possibility  of  partitioning  £  subject  solely  to  the  criterion 
that  the  condition  number  of  the  diagonalizing  similarity  be  less  than  a 
threshold. 

Chapter  4  presents  a  new  lower  bound  on  diss2(0]  ,  aa  .  region )  which  is 
sharper  than  the  previously  best  bound,  compares  it  to  other  known  bounds, 
and  discusses  when  it  is  likely  to  be  sharp.  Combined  with  the  upper  bound 
of  chapter  2  this  yields  the  inclusion 

upper  bound  fc  <1133(0! ,  oz  ,  path)  >  diss(oj  ,  a2  .  region)  %  lower  bound 

Chapter  5  analyzes  how  far  apart  the  upper  bound  of  chapter  2  and  the 
lower  bound  of  chapter  4  can  be,  and  presents  worst  case  examples  which 
show  how  far  apart  the  bounds  must  be.  We  also  compute  diss8(0]  ,  <x2)  and 
disSjrCoj  .  o2)  exactly  for  normal  matrices,  in  which  case  all  four  notions  of 
dissociation  (path  or  region,  2*norm  or  Euclidean  norm)  are  equal. 

Chapter  6  presents  a  geometric/probabilistic  model  of  the  problem,  by 
defining  certain  sets  in  matrix  space  which  are  the  sets  where  the  eigenprob- 
lem  becomes  difficult  We  discuss  the  algebraic  and  geometric  properties  of 
these  sets,  which  are  algebraic  varieties,  and  put  a  probability  measure  on 
matrix  space  which  lets  us  analyze  what  fraction  of  matrix  space  consists  of 
bard  problems. 

Chapter  7  uses  the  model  of  chapter  8  to  compute  probability  distribu¬ 
tions  of  the  smallest  distance 
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from  a  random  matrix  to  one  with  a  given  rank  (such  as  the  nearest 
singular  matrix), 

from  a  random  matrix  to  one  with  a  given  Jordan  form  (such  as  the 
nearest  matrix  with  a  double  eigenvalue),  and 

from  a  random  polynomial  to  one  with  a  given  zero  structure  (such  as 
the  nearest  polynomial  with  a  double  zero). 

Chapter  B  uses  the  model  developed  in  chapter  6  and  the  results  of 
chapter  7  to  analyze  when  the  bounds  discussed  in  chapters  2  through  5  are 
likely  to  be  accurate.  We  show,  for  example,  that  the  ratio  of  the  upper  to 
lower  bounds  on  dissz(o1  ,  Og)  cannot  exceed  K>  1  except  on  a  set  of  matrices 
of  probability  proportional  to  K~2. 

Chapter  9  investigates  the  usefulness  of  the  probabilistic  model  of 
chapters  6  and  7  for  analyzing  the  performance  (speed  and  accuracy)  of 
algorithms  for  matrix  inversion,  eigendecompositions,  and  polynomial  root 
finding.  A  paradigm  for  analyzing  performance  is  presented,  which,  when 
applied  to  matrix  inversion,  yields  a  lower  bound  on  the  probability  distribu¬ 
tion  of  the  relative  error  in  Gaussian  elimination.  The  model,  because  it 
ignores  the  effects  of  finite  precision  arithmetic,  fails  to  provide  any  useful 
information  at  all  about  certain  algorithms  whose  performance  depends 
strongly  on  the  effects  of  finite  precision  arithmetic.  We  show  bow  extending 
the  model  to  take  finite  precision  arithmetic  into  account  could  be  used  to 
measure  how  many  problems  can  be  solved  as  a  function  of  the  amount  of 
extra  precision  carried  in  intermediate  computations. 
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Chapter  2:  Preliminary  Definitions  and  Lemmas 

2.1  Introduction 

In  this  chapter  we  introduce  the  notation  and  dissociation  measures 
used  in  the  rest  of  the  thesis.  These  dissociation  measures  will  be  used  later 
in  the  thesis  to  construct  upper  and  lower  bounds  on  diss(ffi  .  erg).  These 
upper  and  lower  bounds  can  be  far  apart;  just  how  far  apart  is  the  subject  of 
chapters  5  and  8.  However,  it  is  unlikely  that  they  are  very  far  apart; 
chapters  6  and  7  will  present  a  natural  model  for  "picking  a  matrix  at  ran¬ 
dom"  which  we  will  use  in  chapter  8  to  make  this  assertion  precise  and  prove 
it. 

The  rest  of  this  chapter  is  organized  as  follows.  Section  2.2  discusses 
diss(<7i  ,  og  ,  region)  and  diss(oi  ,  o2  ,  path).  (1155(0!  ,  a2  ,  path)  must  always  be 
at  least  as  large  as  diss(0,  ,  az  ,  region)  although  they  may  indeed  differ  in 
certain  circumstances.  We  also  show  that  they  provide  enough  information  to 
cluster  the  eigenvalues  in  the  way  discussed  in  chapter  1.  Section  2.3 
discusses  the  canonical  form  we  use  for  matrices.  Sections  2.4  and  2.5  define 
the  dissociation  measures  ||P||  (P  a  projector),  sep (A,B),  and  sepxCd.f?) 
and  discuss  their  basic  properties.  In  particular,  sep(A,B).  sepx(A,B)  and 
||P  1 1  share  certain  scaling  and  "divide  and  conquer"  properties  which  we 
later  exploit  to  compute  relationships  among  them. 

2.2  The  Difference  between  diss(0t ,  a2  ,  region)  and  diss(0! ,  o2  .  path) 

This  section  discusse  the  difference  between  diss(0i ,  a2 ,  region)  and 
dias(0] ,  08  ,  path),  proves  that  diss(0i ,  a2  ,  path)  i  diss(0! ,  0a  ,  region),  and 
shows  why  they  provide  sufficient  information  to  compute  the  decomposition 
of  chapter  1. 
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Let  us  restate  the  definition  of  diss(?j  ,  a2 ,  region).  The  definition 
depends,  of  course,  on  the  matrix  norm  |||-|||  which  defines  the  shape  of 
T(e)  =  \  T:  ||j  T-Tolll  <sj-  Consider  the  set  o(T(e))  of  all  eigenvalues  of  all 
matrices  T  in  T(e).  o(T(e))  is  an  open  set  and  can  be  written  as  the  disjoint 
union  of  its  connected  components.  ot  and  o2,  being  sets  of  eigenvalues  of  7o . 
must  lie  in  these  connected  components.  Let  o i(T(e))  be  the  union  of  those 
components  containing  points  of  and  0g(T(e))  be  the  union  of  the  com¬ 
ponents  containing  o2.  If  ffj(T(e))  and  o2(T(£:))  are  disjoint,  then  we  can  draw  a 
Jordan  curve  /(e)  having  0j(T(£))  strictly  inside  and  a2(T(e))  strictly  outside. 
As  we  increase  e,  ai(T(e))  and  o8( T(e))  will  grow  from  tiny  neighborhoods 
around  the  eigenvalues  when  e  is  near  0  and  eventually  intersect  for  e 
greater  than  some  e.  at  which  point  the  curve  /(e)  no  longer  exists.  This  e. 
the  supremum  of  the  set  of  e  for  which  separating  curves  /(e)  do  exist,  is  the 
definition  of  diss(a, .  a2  .  region)  (note  the  implicit  dependence  on  the  norm 

Ill-lit)- 

Now  we  define  (1133(9!  ,  og  .path).  Let  T(x)  be  a  continuous  path  starting 

at  T(0)=T0  and  remaining  inside  T(e)  for  all  xteO.  Let  X0=[X1 . A„]  be 

some  ordering  of  Tq's  eigenvalues,  possibly  with  repeated  entries  for  multiple 

eigenvalues.  We  wish  to  define  A(x)=[Ai(x) . A*(x)]  so  that  A(x)  is  a  list  of 

the  eigenvalues  of  Ttx)  and  a  continuous  function  of  x.  This  is  possible  since 
the  eigenvalues  are  continuous  functions  of  the  matrix.  The  only  ambiguity 
arises  when  some  some  Tfco)  has  a  multiple  eigenvalue  Xi(x0)=Xe(x0)  (say). 
In  this  case  one  may  arbitrarily  choose  which  eigenvalue  to  call  \t(x)  and 
which  to  call  Ag(x)  for  x>xq  (this  arbitrariness  will  not  affect  the  definition  of 
dissociation).  Suppose  that  A1(x,)=Xa(x2)  for  some  path  T{x)  and  possibly 
distinct  xt  and  z2.  In  the  language  of  the  last  paragraph,  this  means  that  the 
connected  components  of  o( T(e))  containing  \i  and  Ag  must  coincide,  since 
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both  contain  the  point  Xi(xl)=\c(zz).  Said  another  way,  if  Xj€(7|  and  X2eo2, 
then  diss(aj  ,  o2  ,  region)  <e,  because  X,  and  Xg  belong  to  the  same  region 
cluster  of  T(«).  For  path  clustering  we  make  a  more  stringent  requirement 
on  X(x).  that  X1(x0)=X2(z0)  for  a  particular  value  of  z0  and  some  path  T(x). 
In  other  words,  Xi(z )  and  X*(z )  must  be  able  to  achieve  the  same  value  simul¬ 
taneously.  Now  let  e  be  the  supremum  of  the  set  of  e  such  that  for  all  paths 
T(x)  in  T(e).  Xy(z)  never  equals  X*(z)  for  any  X1(Q)=X1ea1  and  any 
Xa(0)=X2ecr2.  This  e  is  the  definition  of  diss(oj ,  a2 ,  path )  (note  the  implicit 
dependence  on  the  norm  j||  •  |||  ). 

Why  have  we  bothered  to  draw  this  distinction  between 
diss(aj  ,  a2 ,  region)  and  diss(aj  ,  a*  .  path)?  The  example  in  the  next  para¬ 
graph  demonstrates  that  the  two  notions  of  dissociation  can  indeed  differ, 
provided  we  are  allowed  to  choose  a  matrix  norm  |||  -|||  other  than  ||  ||  and 
II  II  g.  We  do  not  know  if  diss(ffj  .  <r2  ,  region)  and  diss((jj  ,  <r2  ,  path)  can  differ 
if  HI  HI  is  one  of  ]|  -||  or  ||  -||  B\  this  is  an  interesting  open  question. 

For  our  example,  we  choose 


To* 


1 

0 


0 

-1 


and  a  norm  ||]  •  |||  whose  unit  ball  is  a  very  narrow  ellipsoid  pointing  in  the 


1  0 
0  1 


direction.  For  example,  we  may  take 


III7’|||2  =  2?(|7’i8|2+  17*,!*+  irn-r**!*)*  ^-|r„+7’*2|2 
where  i?»l.  The  idea  is  that  the  unit  ball  in  the  |||  |||  norm  contains  only 
matrices  close  to  a  multiple  of  the  identity,  so  that  points  in  T(e)  look  like 


plus  an  t  or  smaller  multiple  of  the  identity.  For  e  near  1.  we  get  essentially  a 
pencil  of  matrices  with  eigenvalues  fl+x  ,  —  l+r{,  \x\  s  1.  Thus,  the  region 
corresponding  to  X4=l  is  a  small  neighborhood  of  the  disk  of  radius  1  cen¬ 
tered  at  1,  and  the  region  corresponding  to  Xg=-1  is  a  neighborhood  of  the 
disk  of  radius  one  centered  at  -1.  These  regions  overlap  at  the  origin  and  so 
diss(<7i  ,  CTe ,  region)^  1.  However.  Xj  and  Xg  can  not  be  made  equal  by  pertur¬ 
bations  of  this  size;  indeed  Xi~Xg  remains  close  to  2  until  e  gets  close  to  B ■ 
Therefore  diss(a1  ,  <r2  >  path)  can  be  arbitrarily  larger  than 
diss(at  ,  o2 .  region)  if  we  are  allowed  to  choose  ||j  ■  jj|  other  than  ||  ||  and 

INI*. 

It  is  true  for  any  norm  |||  ■  ||| .  however,  that 

diss(oj ,  <ra  ■  path)  as  diss(0]  ,  og  .  region)  , 
simply  because  if  two  distinct  eigenvalues  can  be  made  equal  by  a  perturba¬ 
tion  of  size  diss(<7i ,  trz  ,  path),  then  no  Jordan  curve  can  be  drawn  separating 
the  regions  of  the  plane  in  which  they  lie. 

It  remains  to  show  why  being  able  to  compute  diss(<7i  ,  oz  ,  path)  (or 
diss(o!  ,  a2 .  region))  is  sufficient  to  cluster  the  spectrum  completely.  By 
clustering  the  spectrum  completely,  we  mean  finding  a  partition 
E  =  {(J, . <7*j  or  To's  spectrum  such  that 

Region  Clustering: 

E  is  the  finest  partition  for  which  we  can  find  Jordan  curves  surround¬ 
ing  disjoint  regions  of  the  complex  plane  containing  0((T(e))  (fft(T(e))  is 
the  component  of  a(T(e))  containing  o*). 

Path  Clustering: 

E  is  the  finest  partitioning  for  which  no  two  distinct  eigenvalues  X*£ffi 
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and  A,  eo j  can  be  continuously  transformed  to  a  common  value  X  along 
some  path  T{x)  in  T(e). 

£  is  well  defined  (in  both  cases)  because  of  the  following  property:  if  Ei  and 
Eg  are  any  two  partitions  satisfying  the  stated  criterion  (for  paths  or 
regions),  then  the  partition  Etn£a  (the  coarsest  common  refinement)  also 
satisfies  the  criterion.  Thus,  E  may  be  uniquely  defined  as  the  intersection  of 
all  partitions  satisfying  the  criterion.  (The  set  of  all  partitions  satisfying  the 
criterion  is  never  empty,  since  it  always  contains  the  trivial  partition  E={aj.) 
Now  note  that  diss(ffj  .  a2  ,  region)  (or  diss^  ,  o2  ,  path))  is  sufficient  to 
determine  if  E={<Ji  .  o2\  satisfies  the  criterion. 

2.3  Schur  Canonical  Form 

Throughout  this  thesis  we  will  ask  questions  of  the  form: 

"What  matrix  T  possessing  property  P  minimizes  ||  T-Tq\  \  ?" 

where  property  P  depends  only  on  the  Jordan  canonical  forms  of  T0  and  T.  It 
will  be  useful  to  know  what  transformations  we  may  perform  on  To  that 
either  do  not  change  this  minimum  distance,  or  change  it  in  an  easily  meas- 
ureable  way.  We  will  use  two  distance  measures,  the  2-norm  ||  ||  and  the 
Euclidean  (or  Frobenius)  norm  ||  ||  g,  which  we  now  define. 

Let 

11*11  =(£i*<!*)l/8 

«>i 

denote  the  Euclidean  length  of  the  n -vector  x.  Then  ||  T||  is  defined  as 


!l r||x » (El7^|8)^a  . 
u 


1 1  T 1 1  g  is  defined  as 
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Another  definition  of  ]|  ■  ||  g  we  will  need  later  is  the  following: 

liriu  *  (tr(rr*))1/z  •  (2.i) 

This  expresses  ||  ■  ||  g  as  the  norm  induced  by  the  inner  product 

<A,B>  *  tr (AB*)  (2.2) 

These  norms  have  the  following  well  known  properties  [Wilkinson2]: 


II  AH  <  \\A\\g*^n  IUH 

IU*II  *IUII  11*11 
\\AB\\gX  IUH  \\B)\S 

IU*IU*IUIU  1151! 

These  last  inequalities  immediately  imply 


(2.3) 


dissz(oi  ,  02)  ^  dissj(oj  ,  <xz)  «  Vn  dissgCo,  ,  az)  .  (2.4) 

We  also  define  the  condition  number  of  the  nonsingular  matrix  5  as 


K(S)  s  ||  5H  ||5-l||  • 

We  may  now  ask  how  much  our  answer  to  our  minimum  distance  ques¬ 
tion  changes  when  we  change  T0  to  ST0S~l: 

Lemma  2.1:  Let  |||-|||  denote  either  ||-||  or  IHU-  d  =  inf  !|j  T-ToHI , 

where  the  infimum  is  over  matrices  T  possessing  property  P,  then  if  S  is 

nonsingular  6S  =  inf  |||  T- SroS-l|ll  satisfies 
T 

6S  *  6  k(S)  (2.5) 

Proof:  Since  property  P  depends  only  on  the  Jordan  canonical  form,  T  has 
property  P  if  and  only  if  ST'S~l  does  as  well.  From  (2.3)  we  see 

-11  liurs-^STyrMII  «  |]|  t-t0\\\-k(s) 

whence  follows  (2.5).  Q.E.D. 

In  other  words,  transforming  F0  to  SToS~x  can  only  change  the 
minimum  distance  by  a  factor  of  at  most  *(S).  We  will  exploit  this  property 
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systematically  throughout  this  thesis. 

In  particular,  if  c(S)=l.  the  distance  cannot  change  at  all.  It  is  well 
known  that  k(S)=  1  if  and  only  if  5  is  a  scalar  multiple  of  a  unitary  matrix. 
Schur's  lemma,  which  we  quote  below,  tells  us  that  unitary  transformations 
are  enough  to  put  any  matrix  into  upper  triangular  form: 

Lemma  2.2  (Schur's  Lemma):  Given  any  n  by  n  complex  matrix  T  there  is  a 
unitary  matrix  Q  such  that  QTQ~l=U  is  upper  triangular.  Furthermore.  Q 
may  be  chosen  so  that  the  eigenvalues  appear  on  the  diagonal  of  U  in  any 
prescribed  order. 

Proof:  See  [Isaacson]. 

These  two  lemmas  tell  us  that  we  may  assume  without  loss  of  generality 
that  our  original  matrix  To  is  of  the  form 


where  a(A)-ax  and  a{B)~a2.  We  will  occasionally  have  need  of  a  related  uni¬ 
tary  canonical  form  where  A  is  upper  triangular  but  B  is  lower  triangular. 
This  form  is  obtained  from  (2.6)  by  reversing  the  order  of  the  last  dim(i?) 
rows  and  columns  of  To. 

2.4sep(J4,f?)  and  Projections 

Projections  and  their  norms  have  been  used  throughout  the  literature  as 
measures  of  the  sensitivity  of  an  eigenvalue  to  perturbations  [Schwarz, 
Kato2,  Kahanl,  Ruhel,  Wilkinson2,  Wilkinson3],  so  it  should  be  no  surprise 
that  we  use  them  here,  too. 

There  are  several  equivalent  ways  to  define  the  projection  P  associated 
with  ffi-  We  will  use  the  following  two: 
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As  described  in  the  last  section,  we  assume  T0  is  in  the  form  (2.8),  where 
o(A)=al,  and  C\<Tz  =  <f> ■  We  also  assume  B  is  lower  triangular.  Let 

n>(=dim(A)  and  ns=dim(j9).  Now  consider  the  system  of  linear  equations 

AR  -  RB  -  C  (2.7) 

which  we  want  to  solve  for  R.  It  is  easy  to  verify  that  if  we  renumber  the 

entries  of  the  nA  by  ng  matrices  R  and  C  so  that  the  first  column  is  num¬ 
bered  1  to  nA  from  top  to  bottom,  the  second  column  from  n^  +  1  to  2 nA,  and 
so  on,  then  equation  (2.7)  can  be  rewritten  as  [Varah] 


(A®/  -  l®Br)Rf  s  jR'  =  C  .  (2.8) 

®  denotes  the  Kronecker  product,  and  R'  and  C"  are  the  reordered  versions 

of  R  and  C  (they  are  nA  ng  dimensional  column  vectors).  The  matrix  ^Aj  is 

a  square  nA  nB  dimensional  upper  triangular  matrix  with  diagonal  entries 

\(A)-\j(B).  In  other  words.  A  and  B  have  a  common  eigenvalue  if  and  only 

if  *4^  is  singular,  a  case  we  rule  out  by  insisting  <7inff8  =  For  example,  if 

B=\Ba\  is  3  by  3,  then 

A—Bw'I  —BziI  ~B3i~I 

*ajb  ~  A-Bzt  l  -B&  I  (2.9) 

A-B*  I 

Thus,  we  may  solve  (2.7)  for  R  given  any  C.  Now  observe  that 


PT  ~  TP  =  jj  (2-11) 

P  projects  onto  the  invariant  subspace  belonging  to  Note  that 

||/»||®=i +  || /?||*  =  ||/-P||*. 
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Now  define  S  as 


5  = 


\I  -R 

1 


=  [SilS8]  . 


(2.12) 


where  S\  consists  of  the  first  nA  columns  of  5  ([/ 1  0]r)  and  Sg  consists  of  the 
remaining  ng  columns  ([-/?r  |  /  jr).  Then  it  is  easy  to  verify  that 


S~'TaS 


'o^-l  b\ 

In  fact,  any  S'  which  diagonalizes  Tg  as  in  (2. 13): 


-(■ 


S-'ToS' 


■N 


with  a(j4’)=aj  and  o(B')=og  is  easily  shown  to  be  of  the  form 


S' 


■»N 


(2.13) 


(2.14) 


(2-15) 


where  D1  and  D2  are  conforming  nonsingular  matrices.  Conversely,  any  S'  of 
the  form  (2.15)  satisfies  (2.14). 

Given  5  as  in  (2.12).  we  express  its  inverse  as 


1  _ 


ICS-1)®, 


(2.16) 


where  (5~1)^  contains  as  many  rows  as  Sj  contains  columns.  Then  P  may  be 
written 


P  =  S,  •  (5_l)(1)  :  (2.17) 

This  is  equivalent  to  (2.10). 

These  facts  will  be  important  in  chapter  3,  where  we  exhibit  D\  and  Dg 
which  minimize  the  condition  number  of  S'. 

A  scaling  property  of  ||  /’ll  analogous  to  the  scaling  property  in  lemma 
2.1  follows  from  this  last  expression  for  P\ 

Lemma  2.3:  If  P  is  the  projector  of  To  corresponding  to  ax,  and  P'  is  the  pro* 
jector  of  UTq  U~x  also  corresponding  to  ax,  then 
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||  p\\  *  K(V)  IJP'li  .  (2.18) 

Proof:  If  P=S,  (S'1)05.  then  P'=  US x  ((S'*)™  U~x)  =  UPU~\  The  result  follows 
by  taking  norms.  Q.E.D. 


Now  that  we  have  defined  the  projection  F,  we  turn  our  attention  to 
sep(j4,B).  Following  Stewart[Stewart],  we  define 

Definition  2.4  The  separation  of  two  matrices  A  and  B  is  denoted  sep(j4,2?) 
and  equals 


sep(. A.B)  «  inf 
R*c 


\AR-RB\\g 


(2.19.a) 


*  (11*^11)-* 


(2.19.b) 


St  the  smallest  singular  value  of  ^/a.B  (2.19.c) 

*  the  distance  (measured  either  with  ||  ■  ||  or  ||  •  ||  g)  (2. 19.d) 


from  to  the  nearest  singular  matrix  . 

If  A  and  B  are  clear  from  context,  we  will  abbreviate  sep(A.B)  by  sep. 

sep(A.B)  has  several  important  properties  which  we  now  enumerate. 
First  of  all  sep(A,B)=sep(B ,A);  this  is  because  'it#  can  be  obtained  from 
~*BA  by  simply  reordering  the  rows  and  columns.  More  importantly  we  have 

Lemma  2.5:  sep(^.P).  ||  P||  and  ||  C|j  s  satisfy  the  following  inequalities: 


ll<r»*sn.  I'7-1'* 

sep'  sep® 

(2.20) 

II  />))  <  i  +  i  +  H-£aIL*. 

sep  sep 

(2.21) 

Proof:  From  (2.8)  follows  ||  R ||  g  £  ||  C||  */sep.  Since  ||  P|| 2  =  l  +  j|  P||  * 
(2.20)  follows.  (2.21)  is  simply  a  coarsening  of  (2.20).  Q.E.D. 
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Thus,  sep(.A.f?)  (with  ||  Toll  E)  provides  an  upper  bound  on  the  norm  of 
the  projection  associated  with  either  A  or  B,  and  conversely,  \\P\\  provides 
a  lower  bound  for  sepCd.i?). 

The  next  property  shows  that  sep  satisfies  a  property  analogous  to 
Lemma  2.1.  We  could  hardly  expect  to  be  able  to  use  sep  to  help  measure  dis¬ 
tances  (and  dissociation)  if  it  did  not  behave  the  same  way  under  transfor¬ 
mations  as  do  distances. 

Lemma  2.6  ([Stewart]): 

”V(S*AS*1'S*BSb')  *  sep (A.B)  k(Sa)  k(Sb)  (2.22) 

Proof: 


sep  (SaASa\SbBS£')  =  rnf  • 


=  inf  - 
R+C 


saas;'r  -  rsbbs§",\e 

II  R\\s 

sa(a(sa1rsb)  -  ( s;'RSB)B)Sg'\\ e 


*  11^11-1154 


r  1 1 


*  115*1 

\Sb\\-\\S£1\\-M 


SaMI  inf 


11*11* 

\\A(S£'RSb)-(Sa-'RSb)B\\e 


II  *11* 

\\MSa1RSb)-(S£'RSb)B\\b 

\\Sa-'RSb\\e 


=  K(SA)  ic{SB)  sep{A.B)  . 

This  proves  the  second  inequality  of  (2.22).  The  first  inequality  follows  by 
symmetry.  Q.E.D. 

The  next  lemma  shows  that  we  may  apply  the  "divide  and  conquer"  para¬ 
digm  to  computing  sep(i4,5)  when  either  A  or  B  is  block  diagonal.  We  write 
A  =  when  A  is  block  diagonal  with  blocks  Aa  . 

Lemma  2.7  [Stewart]: 
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sep(©4  .  ®5j)  =  min  sep (4.5,)  (2.23) 

t  J 

Proof:  It  suffices  to  show  sep(/4.i?i©i?2)  =  mm  sep^.iJj.  From  (2.9)  we  see 
that  if  B  is  block  diagonal,  so  is  ^  &  with  diagonal  blocks  Since  the 

singular  values  of  any  block  diagonal  matrix  are  the  singular  values  of  the 
blocks,  the  results  follows  directly  from  (2. 19.c)  in  Definition  2.4.  Q.E.D. 


Lemmas  2.6  and  2.7  together  tell  us  that  if  A  and  B  can  both  be  com¬ 
pletely  diagonalized  by  not  too  ill-conditioned  similarities  SA  and  Sg  (that  is. 
neither  k(Sa)  nor  k(Sb)  is  very  large),  then  sep(,4,2?)  differs  from 
min|\(i4)  —  Xj(i?)|  by  the  not  too  large  factor  k(Sa)  k(Sb).  The  expression 

min|  X*(vt)  -  \j(B)  | .  the  smallest  difference  between  an  eigenvalue  of  A  and 

an  eigenvalue  of  B,  is  the  coarsest  possible  measure  of  the  dissociation 
between  o{A)  and  o(B).  We  record  this  fact  as 

Lemma  2.8:  If  S/1AS’i|=diag(At(yl))  and  5jf,l?5a=diag(\^ (B)),  then 


min  |X<(^4)  -  \j(B)  | 

nun  I  X.U)  -  X, (B)  |  !=  sep (A.B)  *  .  ,(Sj) - 

Proof:  Combine  Lemmas  2.0,  2.7  to  obtain  the  lower  bound,  and  Definition  2..d 
to  obtain  the  upper  bound.  Q.E.D. 


We  will  show  in  chapter  B  that  the  likely  situation  is  that  A  and  B  are 
diagonalizable  with  reasonably  well  conditioned  similarities  so  that 
niin|\(i4)  -  A^(i?)|  is  a  reasonably  accurate  estimate  of  sep(.4.i?)  and 

diast(o,  ,  a8). 

We  present  one  more  application  of  the  divide  and  conquer  paradigm 
used  in  Lemma  2.8:  if  A  and  B  are  block  diagonal,  the  computation  of  R 
where  AR  -  RB  -  C  can  also  be  broken  into  smaller  parts.  We  illustrate  when 
both  A  and  B  have  two  diagonal  blocks.  The  system  of  equations 
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Ul  1  fall  #12  #11  #12  #1  c„  Ciz  _ 

^  |  j4gj  [#2j  #gg  ^?21  #22  #2  ^21  ^22  ^ 

(where  ail  blocks  are  of  conforming  sizes)  breaks  into  four  independent  sys¬ 
tems: 

-BijBj ,  =  q,  (2.24) 

If  >4  or  2?  has  more  than  two  blocks,  equation  (2.24)  still  applies. 

2.5sepA(A,27) 

Another  measure  of  the  separation  of  two  matrices  is  sepA(i4,ff).  the 
smallest  perturbation  to  A  and  B  which  causes  them  to  have  a  common 
eigenvalue.  sep^A.f?)  will  be  our  upper  bound  on  the  dissociation  between 
and  02,  and  chapters  5  and  8  of  this  thesis  will  analyze  how  much  sep  and 
sepA  may  differ. 

Modifying  a  definition  of  Varah  [Varah]  slightly,  we  define 
Definition  2.9: 

sepxU.S)  a  inf  max(||  (A-XI)-1 1|  "l  .  ||  (5-\7)_1||  _1)  (2.25.a) 

=  inf  max(omln(A -XI ) ,  a^B-XJ))  (2.25.b) 

where  o^n  denotes  the  smallest  singular  value. 

Varah  defines  sepA  as  the  sum  of  the  two  singular  values  rather  than  the 
maximum,  so  his  sepx  cannot  differ  from  ours  by  more  than  a  factor  of  2.  We 
have  modified  his  definition  because  it  lets  us  state  slightly  sharper  results 
later  on. 

sepxM.i?)  is  clearly  an  upper  bound  on  dissg(ai  ,  o2  ,  path).  Of  course, 
there  might  be  a  general  perturbation  of  Tq  (not  just  one  in  A  and  B)  of 
much  smaller  norm  than  sepA(A,27)  that  makes  an  eigenvalue  of  A  coalesce 
with  one  of  B,  so  in  general  sepA(A,#)  only  provides  an  upper  bound  on 
dissg(<ri  ,  ?2  •  poth).  Since  chapter  8  contains  the  rather  surprising  result 
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that  sepx  provides  a  not  too  pessimistic  overestimate  of  this  dissociation,  we 
record  this  fact  in  the  following  theorem. 

Theorem  2.10: 

sepx(A,2?)  ^  dissgCffi  ,  ffg  ,  path) 
y/Z  sep x(A.B)  >  diss*(ff, ,  az  ,  path) 

The  rest  of  this  section  proves  several  properties  of  sepx  that  we  will 
need  later.  In  particular,  we  will  prove  two  lemmas  analogous  to  lemmas  2.6 
and  2.7.  Since  we  are  going  to  relate  sepx  and  sep,  it  is  important  that  they 
behave  similarly  under  transformations  (lemmas  2.6  and  2.11)  and  divide  and 
conquer  (lemmas  2.7  and  2. 12). 

First,  however,  we  present  a  characterisation  of  sepx  analogous  to 
Definition  2.4  of  sep.  The  following  lemma  is  essentially  due  to  Varah  [Varah]: 

Lemma  2.11: 

sep X(A.B)  <  mf  11  ^l/gnT  ^  2  SePx^’^  (2.26) 

Proof:  The  inflmum  in  definition  2.9  of  sepx  is  clearly  attained  for  some  X  by 
compactness.  Thus,  there  exist  unit  vectors  u  and  v  such  that 

sepj^A.f?)  =  max  (gmtX4-X) .  ffnan(S-X)) 

=  max  (||i4u-Xu||  .  ||  v*£?-v*X|| )  . 

Let  R-ioj* .  Note  that  rank(/?)=  1  and  ||  J?||  »  ||  /? ||  g  -  1.  Thus 

\\AR-RB\\g  =  ||  ( A-k)R  -  *(*-*)  II  g 

=  ||  (Ju-ku)vm  -u(u*£-t/"X)||  g 
<  ||  (Ai-Xu)v*||x  +  ||u(v*i?-t/*X)||x 
<2  sep X(A.£) 

This  proves  the  second  inequality  in  (2.26). 
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To  prove  the  other  inequality,  we  again  write  R  as  uv* ,  where  u  and  v 
are  unit  vectors  and  wm  attains  the  inflmum  in  (2.26)  (this  is  again  possible 
by  compactness).  Then  as  before 

||  AR-RB\\  !|  (Au -Xu)v*  -u{y*B-v*\) ||  ^  . 

We  now  exploit  the  alternative  definition  of  1 1  •  1 1  g  given  in  (2. 1)  above: 

||  AR-RB\\ $  as  trj[(i4u-Xu)i/*  -  u(v*B— u*X)][(i4u-Xu)n*  -  u(v*Z?  — i/*X)]*  j 

=  tr[U'  -B')(A'  -B')*] 

=  tr(j4'i4'*)  +  2-Re  lr(A'B'*)  +  tr (B’B'*) 

=  ||  y4-||  f  +  2-Re  trU ■£-)  +  ||**||f  . 

Now  if  we  choose  X  =  v*Bu  (or  u*Au),  A ’  and  B'  are  orthogonal: 

tc(A'B‘m)  =  tr(ix(t;*2?— v*\)v(Au-\u)*) 

=  tr(u(v*Bu  -\)(Au-\u)m) 

*0  . 

Thus  with  this  choice  of  X 

||  AR-RB\\ %  =  ||  (Au-\u)V  ||  1  +  ||  u(WF-i/*X)|!  J 

asmax{!|(A-X)u||*  ||  v*(£-X)||  g)* 

*  ■  sep l(A.B) 
as  desired.  Q.E.D. 

An  immediate  consequence  of  the  definition  of  sep(A,£7)  and  this  last 
lemma  is 

Lemma  2.12: 

sep (A.B)  s  2  sep \(A.B) 

If  dim(A)=l.  then 

sep  X(A.B)  as  sep  (A.B)  =  •  /)  <  2  sepx(A,5) 

where  j4s[an]-  An  analogous  inequality  holds  if  dim(£)=l. 
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Proof:  The  first  inequality  follows  from  lemma  2.11  and  the  first  line  of 
Definition  2.4  of  sep.  The  second  inequality  holds  because  if  dim(.4)=l  then 
the  R  in  Definition  2.4  is  either  a  row  vector  or  column  vector,  and  so  neces¬ 
sarily  of  rank  one.  Q.E.D. 

Just  how  much  smaller  sep  can  be  than  sepA  is  the  subject  of  chapters  5 
and  8. 

The  next  lemma  shows  that  sepx  behaves  similarly  to  sep  under  transfor¬ 
mations. 

Lemma  2.13:  Let  H  =  max  (/c(5^)  .  k(Sb)).  Then 

—  *  sep^AS/1  ,  SgBSg1)  <  sep^A.B) -Tt  .  (2.27) 

It 

Proof:  For  any  X 

liiArxriiiiu  (2  28) 

e  *  k(sa) 

*  ||(54AS/1-X)-1|I 

*  II  u-x)- Ml  -1-*K54)  *  II  U-xm  “  JE  ■ 

An  analogous  inequality  bolds  for  B  and  Sg.  The  lemma  follows  directly  from 
these  inequalities.  Q.E.D. 

There  is  an  important  difference  between  lemmas  2.6  and  2.13:  while 
sepx  may  change  by  It  after  a  transformation,  sep  may  change  by  as  much  as 
K2.  Although  we  do  not  exploit  this  difference  further,  it  bolsters  our  conjec¬ 
ture  of  chapter  B  that  sep/  ||  Tq\\  g  is  almost  always  bounded  below  by  a  con¬ 
stant  times  (sepx/  ||  Toll  *)*■ 

The  next  lemma  shows  that  the  same  divide  and  conquer  formula  holds 
for  sepx  with  block  diagonal  A  and  B  as  for  sep. 
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Lemma  2. 14: 

sepx(©4,  ,  ®Bj)  -  min  sepx(4i  .  Bj)  (2.29) 

Proof:  The  singular  values  of  a  block  diagonal  matrix  are  the  singular  values 
of  the  blocks.  Thus 

sep .  ®Bj)  =  inf  max(amln(©(4i-X))  .  aaiD(®(Bj -X))) 

=  inf  max(min  a^Ai-X)  ,  min  omm0B’J-X)) 

*  i  i 

-  nun  inf  maxC^U-X)  .  ff  -X)) 

=  minsepxU  .  P,) 

as  desired.  Q.E.D. 

In  analogy  to  Lemma  2.6  we  have 

Lemma  2.15:  If  S/1ASi4=diag(Xt(A))  and  S£'IBSi>=diag(\i(.3)).  then 

min  IX^U)  -Xj(5)|  min  IXtU)  - X,(£)| 

2  *  Sep*(i4>i?)  *  2^max  (c(S,)  .  k(Sb)) 

Proof:  The  upper  bound  follows  from  the  definition  of  sepx  and  the  lower 
bound  from  the  last  two  lemmas.  Q.E.D. 
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Chapter  3:  Best  Conditioned  Diagonalizing  Similarities 
3.1  Introduction 

In  this  chapter  we  assume  a  partitioning  E=|Oi . atJ  of  a  has  been 

chosen,  and  ask  the  following  question:  what  is  the  best  conditioned  S  such 
that 

S~lTS  =  diag^j _ 9fc)  (3.1) 

and  We  need  to  use  the  answer  as  a  tool  in  later  chapters. 

Actually,  it  is  as  easy  to  answer  a  more  general  question:  how  ill- 
conditioned  must  a  matrix  S  be  if  its  columns  are  constrained  to  span  cer¬ 
tain  subspaces?  We  answer  this  question  in  order  to  find  nearly  best  condi¬ 
tioned  matrices  Sg  and  Si  that  block  diagonalize  a  given  matrix  pencil 
T  -  A+\B,  i.e.  S£lSg  =  0  is  block  diagonal.  We  show  that  the  best  condi¬ 
tioned  Sg  has  a  condition  number  approximately  equal  to  the  cosecant  of  the 
smallest  angle  between  right  subspaces  belonging  to  different  diagonal 
blocks  of  9.  Thus,  the  more  nearly  the  right  subspaces  overlap  the  more  ill- 
conditioned  Sg  must  be.  The  same  is  true  of  Si  and  the  left  subspaces. 

For  our  original  problem  T  —  A—XJ,  the  standard  eigenproblem.  Si  -  Sg 
and  the  cosecant  of  the  angle  between  subspaces  turns  out  to  be  the  norm 
1 1  P 1 1  of  the  projection  associated  with  each  subspace.  More  precisely,  if  is 
the  projection  associated  with  then 

max  ||  Pi  I)  *  k{Soptiual)  *  '  max  ||  Pt  ||  (3.2) 

where  b  is  the  number  of  blocks  in  (3.1).  Furthermore,  we  can  construct  an 
S,  denoted  Sauwo-  whose  condition  number  k^Scrthq)  lies  in  the  bounds 
given  by  (3.2):  choose  the  rankfPj  columns  of  Sanyo  which  span  the  invari¬ 
ant  subspace  belonging  to  to  be  orthonormal. 
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In  particular,  if  b  =2  so  that  we  are  dividing  T  into  just  two  blocks,  then 
we  can  compute  an  S  (not  Sqrtho)  such  that 

k(S optimal)  ~  *(S)  =  ||  P\\  +  -'/||  P\\  8  -  1 

and 

s-'rs.s-'f*  j  ■ 

We  will  need  this  construction  for  our  lower  bound  on  diss^Oj  ,  oz)  in  the  next 
chapter. 

The  rest  of  this  chapter  is  organized  as  follows.  Section  3.2  defines  the 
notions  of  invariant  subspaces  and  angles  between  them  more  precisely, 
reviews  some  of  the  history  of  the  problem,  and  summarizes  the  results  of 
the  chapter.  Section  3.3  shows  how  to  decompose  T  into  b=2  diagonal 
blocks,  and  section  3.4  handles  the  case  of  6fc3  blocks.  Section  3.5  contains 
the  proof  of  a  technical  result  needed  in  the  proof  of  Theorem  3.1.  Section 
3.8  applies  the  main  results  to  an  error  bound  for  computing  a  function  of  a 
matrix,  such  as  exp(T).  Section  3.7  discusses  the  possibility  of  partitioning  a 
using  only  projection  norms  as  a  criterion.  Finally,  section  3.8  presents  some 
applications  of  the  main  results  unrelated  to  eigenproblems. 

Most  of  this  chapter  has  been  published  already  [DemmelJ,  except  for 
sections  3.7,  3.8  and  part  of  3.4. 

3.2  Definitions  and  Summary  of  Results 

Two  measures  of  the  ill-conditioning  of  the  eigenvalues  of  a  matrix  have 
appeared  frequently  in  the  literature.  One  is  the  condition  number  of  a 
matrix  5  which  (block)  diagonalizes  T  under  similarity  (  i.e.  S~lTS  is  block 
diagonal),  and  the  other  is  the  norm  of  the  projection  matrix  Pi  belonging  to 
the  spectrum  of  the  i-th  diagonal  block  of  S“l7S  (if  the  i-th  block  is  1  by  1, 
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the  norm  of  Pi  is  usually  denoted  1/  |  st  |  [Wilkinson2]).  Many  authors  have 
shown  that  the  larger  the  condition  number  of  S,  or  the  larger  the  norm  of 
Pi,  the  more  sensitive  to  perturbations  are  at  least  some  of  the  eigenvalues 
of  T.  Bauer  and  Fike  [Bauerl],  Kato  [Kato2],  Kahan  [Kahanl],  Ruhe  [Ruhel], 
Wilkinson  [Wilkinson2,  Wilkinson3]  and  others  have  all  contributed  theorems 
stating  this  result  in  different  ways.  Recently  Sun  [Sun]  has  extended  many 
of  these  results  to  regular  matrix  pencils. 

Our  goal  in  this  paper  is  to  show  that  these  two  measures  of  ill- 
conditioning  are  nearly  equivalent.  We  state  our  result  in  terms  of  angles 
between  subspaces  because  this  makes  sense  for  pencils  T-A+KB  as  well  as 
the  standard  eigenproblem  T~A-\1:  the  condition  number  of  the  best  5 
which  displays  the  block  structure  is  within  a  small  constant  factor  of  the 
cosecant  of  the  smallest  angle  between  a  subspace  belonging  to  one  diagonal 
block  and  the  subspace  spanned  by  all  the  other  subspaces  together.  In  the 
case  of  the  standard  eigenproblem  this  cosecant  turns  out  equal  to  the  larg¬ 
est  of  the  norms  of  the  projections  Pt. 

We  exhibit  a  best  S  for  decomposing  T  into  two  blocks  and  compute  its 
condition  number  exactly  in  terms  of  the  norm  of  a  projection  (see  part  2 
below).  This  result  was  obtained  independently  by  Bart  et  al.  [Bart]  and 
improves  an  earlier  estimate  of  Kahan  [Kahanl].  Wilkinson  [Wilkinson2,  p  89] 
and  Bauer  [Bauer4]  relate  the  two  measures  when  S~l  TS  is  completely  diag¬ 
onal;  we  generalize  their  results  to  diagonal  blocks  of  arbitrary  sizes  in 
theorems  3.3  and  3.3a  below. 

The  angle  between  subspaces  is  defined  as  the  smallest  possible  angle 
between  a  vector  u  in  one  subspace  S1  and  a  vector  v  in  another  subspace 
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i^S1  .  S? )  =  min  jarccos  |u*u  J  whenueS1  .  vetf  ,  |j  u  ||  *  ||  v  ||  =1  {(3-3) 
(d  will  be  discussed  more  fully  later). 

If  S1 . Ef  is  a  collection  of  subspaces,  the  space  spanned  by  their 

union  is  denoted  spanjS1 . S*  {. 

With  this  preparation,  let  us  consider  the  subspaces  associated  with  the 

block  diagonal  matrix  SilTSg  =  9  =  diag(9j . 0k),  where  0t  is  rt  by  ct;  rt 

and  Ci  must  be  equal  unless  T  =  A+kB  is  a  singular  pencil  [Gantmacher], 
FVom  Si1TSji=9  follows  TSg  -Si  9  which  implies  that  T  maps  the  space  Sj£ 
spanned  by  the  first  cx  columns  of  Sg  into  a  space  S l  spanned  by  the  first 
columns  of  Si .  Similarly,  columns  ■  +ci_i+l  to  Cj+  ■  •  ■  +ct  of  Sg  span 

a  space  S&  that  T  maps  into  a  space  S£  spanned  by  columns  r4-t-  ■  -  +rt_i+l 
to  rj+  •  •  •  +r4  of  Si-  Stewart  [Stewart]  calls  the  pairs  .  S[  deflating  pairs 
since  they  deflate  T  to  block  diagonal  form.  For  the  standard  eigenproblem 
T-A-XI  we  have  Sjf  =  S£  [Gantmacher]  in  which  case  they  are  denoted  by 
and  called  invariant  subspaces  and  then  no  generality  is  lost  by  assuming 
Sg  *  SL.  Henceforth  we  drop  the  subscripts  R  and  L  of  5  since  they  are 
unnecessary  for  the  standard  eigenvalue  problem  and  since  our  results 
apply  to  each  case  separately  for  the  general  problem  T  -  A+\B. 

Our  problem  is  to  choose  the  columns  of  S  to  minimize  x(S )  subject  to 
the  condition  that  the  columns  span  the  subspaces  Sf.  (It  is  not  important 
for  the  proofs  of  our  results  that  the  $  be  defined  by  an  eigenvalue  problem; 
we  ask  only  that  the  S1  be  linearly  independent  and  together  span  all  of 
euclidean  space.  Thus  our  results  may  be  interpreted  as  results  on  one-sided 
block  diagonal  scaling  of  matrices.)  Our  first  result  will  be  that  by  choosing 
the  columns  spanning  each  subspace  to  be  orthonormal,  we  will  have  an  5 
whose  condition  number  is  within  a  factor  VF  of  optimal,  where  b  is  the 
number  of  diagonal  blocks  of  9: 
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*(*5cwiwo)  s  K(S optimal)  (3.4) 

S ortho  denotes  any  matrix  S  whose  columns  are  orthonormal  in  groups  as 

described  above,  and  S&tjhal  denotes  any  matrix  S  whose  condition  number 

is  as  small  as  possible.  This  extends  a  result  of  Van  der  Sluis  [vanderSluis] 

where  all  subspaces  S1  are  one-dimensional.  Van  Dooren  and  Dewilde  [Van- 

Dooren]  have  also  shown  the  choice  of  Sqrtho  is  nearly  best,  and  in  fact 

optimal  if  the  subspaces  S1  are  orthogonal 

Furthermore,  we  shall  bound  ic(Sortho)  above  and  below  in  terms  of  the 
angles  between  the  subspaces  spanned  by  its  columns.  Let  dt  denote  the 
smallest  angle  between  S*  and  the  subspace  spanned  by  all  the  other  sub¬ 
spaces  together: 


=  t^S1  .  spanJS?  |)  . 

(3.5) 

We  shall  show 

max  (esc  +  Vcsc*  dj  -  1)  :£  k(S optimal) 

(3.6) 

<  ic(Sortho)  s 

*v/  i)csc8di 

^  t«i 

When  b  =2  (i.e.  we  have  only  2  diagonal  blocks)  Scrtho  is  in  fact  optimal, 

and 

k(S ortho )  =  *(Scptiual)  -  esc  0  +  VcscB  d  -  1  =  cot  1 9/  2  .  (3.7) 

S optimal  is  not  unique,  and  we  compute  another  5  for  the  b  =2  case  which  has 

the  optimal  condition  number  and  which  further  satisfies 

s-M*-N  ° 

in  the  case  of  the  standard  eigenproblem  where  S'  is  the  invariant  subspace 
belong  to  A,  and  &  the  invariant  subspace  for  B. 

For  the  standard  eigenproblem  we  also  have  esc  *  ||  Pt\\ ,  where  Px  is 
the  projection  associated  with  subspace  i.  It  follows  from  (3.6)  that  the  two 
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measures  of  ill-conditioning  KiScmuAl)  and  max||Pi||  we  wanted  to  show 
nearly  equivalent  can  differ  by  no  more  than  a  constant  factor. 

max  ||  ^ll  £  KiSapntuu.)  *  &  •  max  ||  Pt  ||  .  (3.8) 

3.3  How  to  Decompose  T  into  2  blocks 

In  this  section  we  show  that  the  best  conditioned  5  whose  first  c 
columns  span  a  given  subspace  S1  and  whose  remaining  n— c  columns  span 
another  given  complementary  subspace  Sf  has  condition  number 

rfSgpi imal)  ~  csc  ^  +  >/cscz  d  -  1  =  cot  ■#/  2  (3.9) 

where  iJ  =  d(S' ,  SF).  Note  that  we  assume  S1  and  S?  are  linearly  independent, 

for  otherwise  5  would  be  singular. 

To  prove  (3.9)  we  will  need  a  technical  result.  Theorem  3.1,  that  bounds 
the  norms  of  submatrices  of  a  positive  definite  matrix  in  terms  of  its  condi¬ 
tion  number.  Theorem  3.1  is  a  slight  generalization  of  an  inequality  of 
Wielandt  [Bauer2]  and  the  proof  technique  used  here  yields  several  other  ine¬ 
qualities  (Theorem  3.4)  one  of  which  (3.55)  is  an  inequality  of  Bauer  [Bauer3], 


be  a  Hermitian  positive  definite  matrix,  partitioned  so  that  A  is  n  by  n,  B  is 
n  by  m,  and  C  is  m  by  m.  Let  k  =  ||  /Z||  ||  //“Ml  be  the  condition  number  of 
H.  Let  X~ut  denote  any  matrix  such  that  =  X~l. 

Theorem  3.1:  It  H  and  k  are  defined  as  above,  then 


or,  equivalently, 


II  (A~l/t)*BC~ut\\  * 


(3.10) 


(3.11) 

Furthermore,  this  bound  is  sharp.  In  fact,  given  any  n  by  m  matrix  Z  such 
that  ||  Z\\  <1,  both  sides  of  inequality  (3.10)  are  equal  for  the  matrix 

*  *  >1 

This  theorem  will  be  proved  in  Part  3.5. 

We  also  need  another  definition  of  the  (smallest)  angle  d  between  sub¬ 
spaces  that  is  more  useful  than  the  one  stated  in  the  introduction.  As  stated 
there.  d  is  the  smallest  possible  angle  between  a  vector  in  one  subspace  and 
a  vector  in  the  other  subspace  (the  largest  possible  angle  may  be  much 
larger  than  the  smallest  if  the  subspaces  are  not  one  dimensional).  If  is  an 
n  by  c  matrix  of  orthonormal  columns  which  form  a  basis  of  S1  and  S2  is  an 
n  by  n-c  orthonormal  basis  of  the  second  space  rf.  then  d  may  also  be 
expressed  as  [Davis] 

^S1 .  Jf)  =  arccos  ||  S^Sall  =  arccos  sup  |y*5»15ax  |  (3.12) 

=  inf  arccos  |  | 

w.v 

where  the  sup  is  over  arbitrary  unit  vectors  z  and  y,  and  where  the  inf  is 
over  unit  vectors  it  in  S1  and  v  in  . 

Now  consider  a  candidate  matrix  S\ 

S ortho  *  [•S'i  I  (3.13) 

where  S\  and  5g  are  orthonormal  bases  of  S1  and  &  respectively.  We  may 

describe  every  other  possible  5  whose  columns  span  S1  and  ?  in  terms  of 

S ortho- 

Sg  ~  S ORTHO  D  =  S ortho  diag(Z?i  •  ^a)  =  |  5*XJ8]  .  (3.14) 

where  D\  is  a  nonsingular  c  by  c  matrix  and  D2  is  a  nonsingular  n  — c  by  n-c 

matrix.  (3.14)  states  simply  that  any  basis  of  S1  can  be  expressed  as  a 


1  -  ||  (A-''*)»BC-Wt\\ 
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nonsingular  linear  combination  StZ?t  of  the  columns  of  one  basis  Si.  We  want 
to  know  which  D  minimizes  k(Sd).  We  compute 


k*(Sd)  =k(Sd*Sd)  (3.15) 

|  Di*Di  DfSfSzD2 
=  DZ*DZ 

We  may  now  invoke  Theorem  3.1  with  A~l/Z  =  D{~1 ,  B  =  Di*S1mSzDz>  and 
C~uz  -  D£l  to  find 


k?(Sd)  is 


1  +  II  5^5, H 

1-  liSi*Sell 


1  +  cos  t? 
1  —  cos 


(  0  <  <;  n/  2  ) 


=  cotz(tf/2) 
or 


/c(Sfl)fecot  13/ 2  .  (3.16) 

If  Dx  and  Dz  are  unitary,  it  is  easy  to  verify  that  we  have  equality  in  (3. 16), 
proving  (3.9)  with  S  cm  UAL  -  S ortho  as  desired.  Note,  however,  that  Scptiual 
is  far  from  unique,  since  there  are  many  orthonormal  bases  for  a  given 
space. 


It  remains  to  show  esc  =  ||  P\\  where  P  is  the  projection  onto  S1  paral¬ 
lel  to  SP.  Recall  from  section  (2.1)  that  if  we  assume  (without  loss  of  general¬ 
ity)  that  T  is  of  the  form  (2.4?) 


AR-RB 

B  1 

then  any  S  which  block  diagonalizes  T  is  of  the  form  (2. 10?  and  2. 12?) 


Also,  the  P  which  projects  onto  S1  parallel  to  is  (2.8?) 


where  1 1  .P 1 1  2  =  1  +  1 1  P 1 1  2- 


By  choosing 

Dx  =  I  and  D2  =  (/  +  R*R)~1/Z  (3.17) 

where  D2  can  be  any  matrix  such  that  D2D2*  -  (I  +  R*R)~X  we  obtain  an 

[/  /?|  \D\  ]  (/  R(I  +  R*R)~UZ 

5  =  [  /  J  l  D2 )  =  [  (I  +  R*R)~UZ 


=  [5,  |  S2] 

where  Sj  and  S2  contain  orthonormal  vectors. 
Thus 


i*  =  arccos  ||  |  (0<i)Sff/2) 

=  arccos  ||  R(I  +  R9RYu2\\ 


esc  i>  =  (1  —  cos2  t3)-1/2 

=  (1  —  ||  R(I  +R»R)-'/z li2)-172 

=  (1  -  ||  (/  +  RmR)~w2R*R(I  +  /?V?)-l/2||  )'1/z 
(since  ||  /f  ||  2  =  ||  H*H ||  for  any  matrix  H  ) 


=  (1  -  ||  R*R(I  R*R)-' 1 1  )'1/2 


=  (1- 


iLggn  1-./1 

i  +  ii^ii 


=  (i  +  ii^h)i/z 


■  *1+ 11*11* 


(3.  IB) 


(3.19) 


■11*11 


as  desired. 
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It  is  possible  to  choose  Dx  and  Dz  which  are  multiples  of  the  identity  and 
also  achieve  the  minimum  condition  number.  We  record  this  fact  here 
because  we  will  need  it  in  Chapter  4.  Choose 


Uj  =  /  and  D2  =  (1  +  ||J?1I*)-1/*  /  .  (3.20) 

We  will  show  that  with  this  choice  of  D\  and  Dz  k(S)  attains  its  minimum 

value.  First  pick  unitary  Qi  and  Qz  so  that  Q^RQl  =diag(rt)  is  diagonal  (Le. 

the  ||  R\\  •  •  •  a trn  are  the  singular  values.)  Then 


te, 

ter 

tei 

I  -R 

D, 

cr 

S'  = 

Qz 

5  Q 

21  =  9z 

I 

Dz 

Ql 

V 


VXT 


T) 


■) 


has  the  same  condition  number  as  5.  and  is  the  direct  sum  of  2  by  2  blocks 
and  1  by  1  blocks.  It  is  a  simple  matter  to  compute  the  largest  and  smallest 
singular  values  of  this  new  matrix,  and  to  show  in  particular  that 


5||8  = 


*11  +  V!]/?|I*  +  1 


V||tf||z  +  1 


and 


II 5-MI2  =  V||*||*+  1  •  (||tf||  +^||/?||z+  1)  . 

The  results  follows  from  multiplying  these  two  expressions  to  get  *(S). 

3.4  How  to  Decompose  T  into  b  Blocks  when  b  >2 

In  this  section  we  first  consider  partitioned  matrices 

5  =  [S,  |  •  •  •  |5*]  (3.21) 

where  each  submatrix  Si  must  span  a  given  subspace  S*  and  show  that  5  is 

nearly  best  conditioned  when  each  St's  columns  are  orthonormal.  Next  we 

bound  the  condition  number  of  the  best  such  5  above  and  below  in  terms  of 

max  esc  where 
< 
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=  15(5  .  span^  ])  .  (3.22) 

Rnally  we  will  discuss  a  different  choice  of  S  (also  discussed  in  the  literature 
[Smith,  Wilkinson2])  which  is  harder  to  compute  and  has  slightly  different 
bounds  on  its  condition  number. 

Theorem  3.2:  Let  S  be 

S  =  [Stl  ■  •  •  |  5*]  (3.23) 

where  5»  contains  c*  columns. 

If  we  choose  the  columns  constituting  Sx  to  be  any  orthonormal  basis  of 
the  subspace  Sf,  then  5  will  have  a  condition  number  no  larger  than  V2T 
times  the  smallest  possible: 

c(S)sV F  ■  k(S optimal)  ■  (3.24) 

Said  another  way,  choose  5  so  that  S*S  has  identity  matrices  (of  sizes  ct  by 

c* )  as  diagonal  blocks. 

Proof:  This  proof  is  a  simple  generalization  of  the  proof  that  by  diagonally 
scaling  an  n  by  n  positive  definite  matrix  to  have  unit  diagonal,  its  condition 
number  is  within  a  factor  of  n  of  the  lowest  condition  number  achievable  by 
diagonal  scaling  [vanderSluis].  We  generalize  diagonal  scaling  for  unit  diago¬ 
nal  to  be  block  diagonal  scaling  for  block  unit  diagonal,  Le.  to  have  identity 
matrices  (of  various  sizes)  on  the  diagonal.  We  show  that  a  block  diagonal 
scaling  with  b  blocks  produces  a  matrix  whose  condition  number  is  within  a 
factor  b  of  the  lowest  possible  condition  number. 

Assume  S*  forms  an  orthonormal  basis  of  S*  and  let  D  be  a  block  diago¬ 
nal  nonsingular  matrix  whose  blocks  Di  are  ct  by  c{.  Then  any  S'  whose 
columns  S\  span  S1  can  be  written  5'  =  SD  for  some  D.  Now 
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!  Siv ! 


vr«®>  =  vf  »  M^iL  .  <3.35) 

min  U  S*  ,j 1 1  -0  *XL>0  I J  ffmta(S) 

—  Ii^-l*ll 

where  z,  is  chosen  so  that  ||z9||  =  1  and  j|Sz,||  =  o^(S)  =  the  smallest 
singular  value  of  5,  and  w0  is  chosen  so  ||tx;0||  =  1  and 
||  D~hua  ||  =  With  this  choice  of  w0  the  factor  ||  D~lzt  |)  /  ||  D~xv)9  || 

is  at  least  one.  Since  D  is  block  diagonal,  can  be  chosen  to  have  nonzero 
components  corresponding  to  only  one  block  of  D.  Thus, 
||Suj#)|z  =  1 1  wa  *S*Swt  1 1  =  ||  wt  *tt»0  1 1  =  1.  Since  the  largest  singular  value 
ffina>(5)  satisfies 

«(5)  =  I!  511  *  Villen*  =  -x/JT  =  VF  . 

V  <*1  V  iml 

we  get 


^k{SD)  *  ^|j-=  *(S)  ■  (3.26) 

Since  (3.26)  is  true  for  any  D ,  it  is  true  in  particular  when  SD  =  SomuM,. 
Q.E.D. 

Van  Dooren  and  Dewilde  [VanDooren]  have  improved  the  factor  \/F  and 
shown,  in  particular,  that  if  the  subspaces  are  themselves  orthogonal,  then 
the  above  choice  of  S  is  in  fact  optimal. 

In  the  case  6  =2  we  expressed  k^Sqpttj^)  in  terms  of  esc  d,  where  d  was 
the  smallest  angle  between  S1  and  SP.  We  can  also  bound  ic(S)  here  in  terms 
of  the  esc  d*.  where  d*  is  the  angle  between  S1  and  its  complement 

lliaonem  3.3:  Let  T,  S  and  esc  d*  be  defined  as  above.  Then 


max  (esc  dt  +  Vcsc*  d<  -  1)  ■&  k(S)  £  VF 
or  weakened  slightly, 


(3.27) 


Proof:  This  proof  is  based  on  a  similar  result  of  Wilkinson's  [Wilkinson2,  p.  89] 
when  all  invariant  subspaces  are  one  dimensional.  First  we  will  prove  the 
lower  bound  and  then  the  upper  bound. 

From  (3.18)  we  know  that  any  S  (not  just  the  one  defined  above)  which 
has  one  group  of  columns  spanning  E?  has  a  condition  number  bounded  from 
below: 

k(S)  i  cot  iS4/2  =  esc  +  V esc2  d*  -  1  .  (3.29) 

Since  (3.29)  is  true  for  all  i,  the  lower  bound  follows  easily. 

We  compute  the  upper  bound  as  follows: 

k{S)  =  ||  S 1 1  ||  5-‘|j  *  ^  II  S-‘||  (3.30) 

since  ||  5||  s  VF  (as  mentioned  in  the  proof  of  Theorem  2).  Using  notation 

analogous  to  (2. 14)  define  the  matrix  P4 

P«  =  Sx  (5-‘)(<)  (3.31) 

(which  would  be  the  matrix  projection  onto  S1  for  the  standard  eigenprob- 

lem).  Since  St  consists  of  ortbonormal  columns,  (3.31)  and  then  (3.19)  yield 

II  (5~1)(<)||  =  ||  AH  =csctft  (3.32) 

Thus 

II  s-' II  *  VilK5'1)10!!8  =  "x/ (3.33) 

v  4i|  v  t*l 

and  the  upper  bound  follows.  Q.E.D. 

The  lower  bound  in  Theorem  3.3  has  been  proven  by  Bauer  [Bauer4]  in 
the  case  when  all  invariant  subspaces  are  one-dimensional . 

The  other  choice  of  S  discussed  in  the  literature  is  scaled  so  that  the  i- 
th  diagonal  block  of  S*S  is  esc  times  an  identity  matrix  of  size  c4  by  ct. 
With  this  choice  of  5  the  i-th  diagonal  block  of  (5*5)-1  has  the  same  norm  as 
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the  corresponding  block  of  5*5,  namely  esc  dt.  Smith  [Smith]  showed  in  the 
case  when  all  invariant  subspaces  are  one-dimensional  that  this  choice  of  5 
is  optimally  scaled  with  respect  to  the  condition  number  defined  with  the 
Euclidean  norm: 

**<5)  =  ||S||,  ||S-»|i*  ■ 

More  generally,  with  this  choice  of  5,  Theorem  2  is  weakened  slightly  to 
become: 

Theorem  3.2a:  With  5  chosen  so  that  the  i-th  diagonal  block  of  5*5  is  esc 
times  an  identity  matrix,  we  have 

k(S)  <  b  ■  tci.SgprjHAi)  .  (3.34) 

Proof:  Similar  to  Theorem  3.2. 

Theorem  3.3,  on  the  other  hand,  becomes  slightly  stronger: 

Theorem  3L3a:  With  5  chosen  as  in  Theorem  3.2a.  we  can  bound  c(5)  as  fol¬ 
lows: 

max  (esc  dt  +  Vcsc*  -  1)  £  e(5)  s£  £  esc  d*  .  (3.35) 

*  <i*i 

Proof:  Similar  to  Theorem  3.3. 

The  upper  bound  of  Theorem  3.3a  generalizes  a  result  of  Wilkinson  [Wil- 
ldnson2,  p  89]  for  one  dimensional  invariant  subspaces.  Note  that  the  "spec¬ 
tral  condition  numbers"  1/  |  s*  |  used  by  Wilkinson  and  others 
[Smith, Wilkinson2]  are  just  esc  dt  (or  1 1  1 1 )  when  the  invariant  subspaces 

are  one-dimensional.  When  £  esc  d4  is  large  the  upper  bound  in  (3.35)  is 

<«i 

comparable  with  the  upper  bound  on  k(5 optimal)  given  by  Bauer  [Bauer4, 
Theorem  VH]  in  the  case  of  one-dimensional  invariant  subspaces. 
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This  choice  of  S  is  more  difficult  to  compute  than  the  S  of  Theorems  3.2 
and  3.3  because  of  the  need  to  compute  the  esc  d*.  though  not  much  more 
difficult  if  the  subspaces  are  all  one  or  two  dimensional. 

3.5  Proof  of  Theorem  3.1 

This  theorem  was  stated  in  section  3.3. 

Unit  vectors  ieC"  and  y  e  C*  satisfying 

y»(A-'/*)*BC-1/zx  =  ||  {A~l/Z)*BC~WZ\\  (3.36) 

must  exist  Use  them  to  construct  the  unit  vectors 

•  *  A-wty/\\A-l/zy\\  .  =  C~l/ix/  ||  C~l/Zx||  .  (3.37) 

and 

*(*)  =  k  cosJl  ■  <3-30) 

We  want  to  consider  H  acting  on  the  2-dimensional  subspace  in  which  s(d) 

lies.  Now 

s*(d)/fc(d)<A  (3.39) 

implies 

[.".in*,  »•=»»<]  (3.40) 

or 

ain^d  •  z*Az  +  cos^d  •  w*Cw  +  sindcosd  (w*B*z  +  zmBw)  c  A  .  (3.41) 
To  simplify  notation,  let  a  -  z*Az  and  c  a  w*Cw. 

From  (3.36)  and  (3.37)  we  know  that 

z*Bw  =  \\{A-uz)BC-'/i\\/{\\A-'/*y\\  ■||C-1'zx||)  (3.42) 

*  ||  (A-v^BCr”*]]  ■  ||  Awzz\\  ■  ||  Cl/iw\\  . 

Since  (C,/8)*Cl/z  =  C.  we  get  c  a  WCw  =  ||  w*{Cui)*Cwzw\\  =  ||  Cl/Zw  || z. 

Similarly,  a  -  z*Az  -  ||  Al/iz  ||  *,  so  (3.42)  becomes 


z*Bw  -  1IG4-I'8)«scr‘/Z||  ■  V5c  . 
Substituting  (3.43)  into  (3.41)  and  rearranging,  we  obtain 
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(3.43) 


( cos2i>  +  VoF  ||  (A-l/z)*BCrl'z ||  sm2U  £  A  (3.44) 
Since  i>  was  arbitrary,  we  can  maximize  the  LH.S.  of  (3.44)  over  yielding 

(— )  +  V(£f2^z  +  ac  1IU_1/8)*5C~1/2I!2!S  A  •  (3-45) 

or 


U^-i/aj.^C-i/Zjj  s  V(A_-  (cta)/2)j  -  ((c-a)/2)z 


a  V(A  -  a )  (A  -  c ) 
Vac 

Similarly,  the  inequality 


implies 


Ass*(i»/fc(tf) 


(3.46) 


(3.47) 


XS(^  +  (~^)  cos2t>  +  VoF  ||  (A~X/9)*BC~X/Z\\  sin2i?  (3.48) 

Minimizing  the  R.H.S.  of  (3.48)  over  we  obtain 

X  *  ( ^  -  V +  «  II  U-u*)*BC-"*\\  2  (3.49) 

or,  rearranging. 

||(A~1/z)*FCr1/8||  *  -^C,-Z^L  .  (3.50) 

Combining  (3.46)  and  (3.50)  yields 

||  (A~l/Z)*BC~UZ\\  <  min[V(a  -  X)  (c  -  \)/  {ac )  .  V(A  -  a)  (A  -  c )/ (ac )]  . 
All  we  know  about  z*Az  *  a  is  that  XsssA.  and  similarly  A  <  c  a  w*Cw  s  A. 
Thus 

||  (A~v2)*fiCTl/8||  s  ^max^  min[v(a-X)(7-\)/  (ya) ,  ViA-a)(A~7)/  (7o)](3.5l) 
Since  (a  -  A)/ a  is  an  increasing  function  of  a  and  (A  -  a)/  a  is  a  decreasing 


function  of  a  in  the  range  Afia&A.  we  see  the  max  in  the  last  inequality 
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occurs  when  the  two  arguments  of  the  min  are  equal.  This  equality  implies 

(a  -  X)  (7  -  X)  =  (A  -  a)  (A  -  7)  (3.52) 

or 

a  +  7  =  A  +■  X  .  (3.53) 

Substituting  (3.53)  into  (3.51)  yields 

||  ||  *  m£ucV(7  -  X)  (A  -  7)/  V^A  +  X  -  7)  (3.54) 

-  A-X 
”  A  +  X 

-  *  ~  1 

"  *+  1  ' 

as  desired. 

Any  2  by  2  positive  definite  matrix  whose  diagonal  entries  are  equal 
shows  the  the  inequality  of  Theorem  3. 1  is  sharp. 

We  now  show  that  given  *  and  Z  -  (A~ur)*BC~wz  such  that  ||  Z||  <  1 
and  the  inequality  of  the  theorem  is  sharp,  it  is  possible  to  construct  an  H 
with  the  given  constraints.  Simply  choose 

A  -  I  ,  C  =  T  and  B  -  Z  (3.55) 

corresponding  to  the  (arbitrary)  choice  A  =  1  +■  ||  Z||  and  X  =  1  -  ||  Z\\ .  It  is 

easy  to  verify  that  every  inequality  in  the  proof  is  sharp  for  this  choice  of  A, 

B,  and  C.  Q.E.D. 

Theorem  3.4:  Let  H,  A.  X.  and  k  be  as  above.  Define  X~1/s  such  that 
X~w\X~ui)*  -  X~l.  Then  the  following  inequalities  are  sharp: 

||  BC~X  ||  a  |-(V«  -  1/V<)  (3.56) 

||  A-'B\\  *  |"(vTc  -  1/Vic)  (3.57) 


||  £|j  <f(A-X) 


(3.58) 
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||  {A~ui)*B\\  £  VA-  VX 

(3.59) 

||  BC~1/2|| 

(3.60) 

Proof:  All  the  proofs  are  analogous  to  the  proof  of  Theorem  3.1.  To  prove 
(3.56),  for  example  (also  proved  in  [Bauer3]).  choose  z  and  y  unit  vectors 
such  that 

z'BCT'y  =  j|5C-,|| 

and  let 

*  =  CT'y/\\C~'y\\  . 

Consider  H  restricted  to  the  two  dimensional  subspace  in  which 

*<»>  *  fcissl 

lies.  The  rest  of  the  proof  follows  similarly  to  that  of  Theorem  3.1. 

We  can  also  show  that  given  k  and  arbitrary  Ft  -  BC~l  such  that  (3.56)  is 
sharp,  it  is  possible  to  construct  an  H  with  the  given  constraints.  Simply 
choose 

C=I  ,  A  =  ( )  /  and  B  =  R  (3.61) 

corresponding  to  the  (arbitrary)  choice  A  =  (*  +  l)/2  and  \  =  (k  +  l)/2 k.  It 
is  easy  to  verify  that  every  inequality  in  the  proof  is  sharp  for  this  choice  of 
A,  B,  and  C. 

Note  that  Theorems  3.1  and  3.4  are  still  true  when  A,  B,  and  C  are  con¬ 
forming  submatrices  extracted  from  a  larger  H  (or  Q*HQ  with  Q  unitary) 
since  the  bounds  are  monotonic  in  k  (or  A  and  A).  In  particular,  if  A,  B,  and  C 
are  scalar  Theorem  3.1  becomes  an  inequality  of  Wielandt  [Bauer2). 

3.6  Computing  a  Function  of  a  Matrix 

In  this  section  we  want  to  show  why  a  well  conditioned  block  diagonaliz¬ 
ing  matrix  5  is  better  than  an  ill-conditioned  one  for  computing  a  function  of 
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a  matrix  T.  Assuming  /(f)  is  an  analytic  function  of  T,  we  compute  f  (T)  as 
follows: 


1/(0,) 


f(T)  =  f(SQS~l)  =  Sf(9)S~l  =  s 


S~l  .  (3.62) 


I  /(®m)j 

The  presumption  is  that  it  is  easier  to  compute  /  of  the  small  blocks  9*  than 
of  all  of  T.  We  will  not  ask  about  the  error  in  computing  /  (9t)  but  rather  the 
error  in  computing  9  =  S~lTS  and  f  (T)  =  Sf  (9)S~l.  In  general,  we  are 
interested  in  the  error  in  computing  the  similarity  transformation 


X  =  SYS~K 


We  assume  for  this  analysis  that  we  compute  with  single  precision  float¬ 
ing  point  with  relative  precision  e.  That  is,  when  ■  is  one  of  the  operations 
or  /,  the  relative  error  in  computing  is  bounded  by  e: 

/£(a«6)  =  (a«6)  (l+e)  where  .  (3.63) 

Using  (3.63)  it  is  easy  to  show 

lijmmii  3.5:  Let  A  and  B  be  real  n  by  n  matrices,  where  ne  <  .1.  Let  |A! 
denote  the  matrix  of  absolute  entries  of  A:  \A  |v-  =  |j4^  | .  Then  to  first  order 
in  e  the  error  in  computing  the  matrix  product  AB  is  bounded  as  follows: 

\fl(AB) -AB\  <ne\A\  \B\  ■  (3.64) 

Proof:  See  [Wilkinsonl]. 

Computing  X  =  SYS'1  requires  two  matrix  products:  Z  ~  fl(SY)  and 
X  *  f  l  (Z5-1).  where  we  assume  S  and  S-1  are  known  exactly.  Applying 
Lemma  3.5  to  these  two  products  leads  to 

lemma  3.8:  To  first  order  in  t 


||  fl (SYS'1)  -  SYS'1 1|  s  *  Zn2tic(S)  1 1  Y ||  M  . 


(3.65) 
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Proof: 

ll/^SKS-1)  -SYS-'\\s 

*  II  I  fl(SYS~')  -  SYS-1  |||x 

=  ||  |  fl(SYS~')  -fl(SY)S~l  +  fl(SY)S~1  -  SYS~l  |  ||x 

*  ||  |  fl(SYS~l)  -  fl(SY)S~'  |  ||  x  4  ||  |  fl(SY)S~' -  SYS-'  |  ||* 

<nt  ||  1/1(5101  \S~'\  ||x  +  ne  ||  |S|  \Y\  |S~‘|  ||* 

<2nc  ||S||x  ||  Y\\g  ||  5-1||x 
(to  first  order  in  e) 

<2n*t<c{s)  unix 
Q.E.D. 

Assuming  this  bound  is  realistic,  it  is  clear  that  picking  S  to  keep  k(S) 
small  is  advantageous.  The  error  in  computing  similarity  transformations  of 
matrices  Is  discussed  in  more  detail  in  Wilkinson  [Wilkinson2.  chap  3]. 

3.7  On  Prelection  Norms  as  a  Partitioning  Criterion 

The  analysis  of  the  last  section  suggests  that  if  the  purpose  of  our  eigen- 
decomposition  is  to  compute  functions  of  matrices,  then  it  may  be  sufficient 

to  compute  the  partition  £=jai . a*j  of  a  subject  only  to  the  constraint 

that  ||  Pjll  be  less  than  some  threshold  H  for  each  i,  rather  them  the  more 
complicated  requirement  that  the  dissociation  between  a*  and  u  o<  be  larger 

than  some  threshold  e.  This  is  because  the  error  bounds  depend  only  on  pro¬ 
jector  norms.  For  example,  it  is  trivial  to  compute  the  exponential  of  a  diago¬ 
nal  matrix  by  exponentiating  each  diagonal  entry,  and  any  diagonal  matrix  is 
decomposable  by  the  projector  norm  criterion  (all  projectors  are  of  norm 
one).  The  more  stringent  dissociation  criterion,  however,  would  forbid  any 
decomposition  of  the  identity  matrix,  which  is  clearly  a  bad  idea.  Kfigstrbm 
[K$gstr8m2]  has  used  this  partitioning  criterion  successfully  for  computing 
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matrix  functions. 


There  is,  however,  a  small  problem  with  defining  £  in  terms  of  bounded 
projector  norms  instead  of  the  dissociation  criterion:  partitions  defined  by 
bounded  projector  norms  do  not  satisfy  the  intersection  property  described 
in  section  2.1.  In  other  words,  E1  =  jo,  \J<rz  ,  c3j  may  be  a  legitimate  partition 
since  ||  Pt  +  Pz\\  =  ||  P3||  <  k,  and  £8  =  ,  <x2J  may  be  a  legitimate  par¬ 

tition  since  ||  P,  +  P3H  =  ||  P2||  <  Z,  but  E  =  E1  =  $0i  .  er2  .  a3j  may  not  be 
legitimate  since  ||  P,||  can  be  as  large  as  ||  Pz  +  P3 1|  s  ||  Pz\\  +  ||  P3||  <  ZmZ. 
Consider,  for  example. 


T  = 


0  z 
1 


z 

0 

-1 


with  z8  a  little  less  than  J8®-1,  0,={Oj,  o2=Jlj,  and  o3={-lj.  A  factor  of  2  is 
not  bad,  especially  since  it  does  not  seem  likely  we  can  measure  ||  P\\  or  the 
smallest  ||ir||  that  accurately  for  a  reasonable  price.  Nonetheless,  it  is 
unfortunate  that  we  lose  the  intersection  property  which  makes  the  best  par¬ 
tition  well  defined. 


3.8  Applications  of  a  Variation  of  Theorem  3. 1 

It  is  more  convenient  here  to  use  a  slight  variation  on  Theorem  1,  stated 
as  (3.56)  in  Theorem  3.4: 

||  A~lB\\  as  |-(>/k  ~  1/v^)  • 

Application  1:  Cholesky  without  square  roots.  The  square  root  free  Cholesky 
algorithm  (triangular  factorization)  decomposes  a  positive  definite  Hermitian 
matrix  H  into  the  product  of  a  unit  lower  triangular  matrix  L,  a  nonnegative 
diagonal  matrix  D,  and  L*: 


H  =  LDL* 


50 


We  wish  to  bound  the  entries  of  L.  Consider  the  following  partitioning  of  the 
decomposition: 

From  (3.88)  we  see 

LXDXR*  =  B 
or 

R*  =  (LxDy)-'B 

=  L*X{LXDXL*X)~'B 
-  L*XA~XB  . 

Since  L*  x  is  unit  upper  triangular,  the  last  row  of  K  and  the  last  row  of  A~1B 
are  identical.  But  the  last  row  of  R*  is  the  conjugate  transpose  of  a  subdiago¬ 
nal  column  of  L.  Thus 

||  subdiagonal  column  of  X,||  =  ||  last  column  of  corresponding  A-1B|| 

<  1 1  all  of  corresponding  A~lB  1 1  , 

and  so  Theorem  3.4  implies 

||  subdiagonal  column  of  Zj|  <  -  l/>/*)  ■ 

M 

A  2  by  2  example  suggested  by  the  proof  of  Theorem  3.4  (see  (3.60))  shows 
this  bound  is  achievable. 

This  bound  is  tighter  than  the  simpler  bound 

| Lq  |  <  ■'/(//#  -  Du)/ Dj}  s  V(A  —  A)/ X  -  V*  —  1  ,  (3.67) 

which  is  derived  by  considering  the  i.i-th  entries  of  both  sides  of  H  -  LDL*\ 

L§Djj  +  Du  +  positive  terms  =  H%  . 

This  result  can  also  be  used  to  get  a  lower  bound  on  k{H  )  given  its  Chole- 


A 


\Lx*  R* 
Lz * ! 


(3.66) 


sky  decomposition. 
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A  similar  application  to  Gauss-Jordan  elimination  appears  in  [Bauer3] 

Application  2:  Gram-Schmidt  Orthogonalization  Process.  The  Gram-Schmidt 
process  takes  a  set  of  independent  vectors  liissm,  and  produces  a 

set  of  orthonormal  vectors  gt  CC",  lsism,  where  g*  is  a  linear  combination 
of  i»j  through  vi  and  orthogonal  to  t»j  through  for  i>l.  We  wish  to  bound 
the  coefficients  of  gj  to  gt_t  (or  to  i^-i)  in  the  expression  for  qt.  We  do  this 
by  showing  Gram-Schmidt  to  be  equivalent  to  square-root-free  Cholesky  per¬ 
formed  on  a  certain  matrix,  and  use  Application  1. 

The  Gram-Schmidt  process  expresses  g{  as  a  linear  combination  of  vi 
and  qx  through  gt_j.  Let  V  be  the  n  by  m  matrix  %hose  columns  are  the  vec¬ 
tors  Vi  and  let  Q  be  the  n  by  m  matrix  with  columns  g».  Then  the  Gram- 
Schmidt  process  may  be  expressed  succinctly  as 

V-QDuiU  .  (3.68) 

where  1/isannbyn  unit  upper  triangular  matrix  and  D  is  an n  by  n  nonne¬ 
gative  diagonal  matrix.  The  entries  of  U  are  the  coefficients  we  seek  to 
bound.  Multiplying  both  sides  of  (3.88)  on  the  left  by  their  transposes,  we 
obtain 

U*DU  .  (3.69) 

U  is  the  factor  of  V*V  obtained  by  doing  square  root  free  Cholesky.  Thus, 

from  Application  2  we  see 

|j  superdiagonal  column  of  C/]|  s  ^-(  -  1/ Vc(V*V’)  )  ,  (3.70) 

which  is  the  desired  bound. 

If  we  wanted  to  express  gt  as  a  linear  combination  of  vt  through 
instead  of  and  gi  through  gt-],  we  would  express  the  Gram-Schmidt  pro¬ 


cess  as 
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VODrV*  =  Q  .  (3.71) 

What  is  a  bound  for  the  columns  of  O'?  Multiply  both  sides  of  (3.71)  on  the  the 
left  by  their  transposes  to  obtain 

D-v*0*VV0D-l/*  =  Q*Q  =  /  .  (3.72) 

or 

0*K)->=  OD-'D*  .  (3.73) 

0  is  the  factor  of  (V*V)~X  obtained  by  doing  square  root  free  Cholesky  start¬ 
ing  at  the  lower  right  corner  of  (V»K)~*  instead  of  the  upper  left  comer  as  is 
usual.  Thus,  from  Application  2  we  see 

||  superdiagonal  column  of  0j|  £  |-(  V*((y*l')-1)  -  1/  Vc(( V*0~'))  (3.74) 

=  |-(  VSW  -  1/v^FJ)  . 

since  k(AI)  —  k(M  ')  for  all  M.  Thus,  we  get  the  same  bound  on  the  columns 
of  0  as  on  the  columns  of  U. 
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Chapter  4:  Lower  Bounds  on  (11882(17!  ,  o2  ,  region) 

4.1  Introduction 

In  this  chapter  we  present  a  new  lower  bound  on  diss2(oi  ,  ctz  ,  region) 
which  is  sharper  than  previous  lower  bounds.  In  section  4.2  we  present  a  his* 
tory  of  previous  lower  bounds,  in  section  4.3  we  present  and  prove  our  new 
bound,  and  in  section  4.4  we  present  examples  when  the  new  lower  bound  is  a 
good  estimate  of  dissg^ ,  Og  ,  region),  and  other  examples  showing  how 
badly  the  lower  bound  can  underestimate  dissg^  ,  <j2  ,  region).  These  exam¬ 
ples  will  be  needed  in  chapter  8  on  probabilistic  bounds.  Since  we  will  only 
be  using  the  "region"  based  dissociation  notion,  we  will  drop  the  word 
"region"  from  the  arguments  of  the  dist  function  for  notational  simplicity  in 
this  chapter. 

4.2  Previous  Lower  Bounds 

We  discuss  three  previous  lower  bounds  in  this  section.  The  first  two.  due 
to  Dunford  and  Schwarz,  and  Bauer  and  Fike,  are  usually  called  inclusion 
theorems  because  they  give  upper  bounds  on  the  perturbations  in  eigen¬ 
values  given  the  norm  of  the  perturbation.  We  will  use  the  results  of  Chapter 
3  to  show  that  these  results  are  essentially  identical  and  derivable  from 
Gerschgorin's  Theorem  [Isaacson].  The  third  result,  due  to  Stewart,  was  until 
now  the  sharpest  known  bound.  We  will  discuss  lower  bounds  on  diss8((71  ,  o2). 
which  are  also  lower  bounds  for  dissj(<7i  ,  og)  by  inequality  (2.4). 

The  first  result  is  due  to  Dunford  [Dunford,  lemma  6]  and  Schwartz 
[Schwartz,  lemma  3].  Taken  together,  these  lemmas  show: 

Theorem  4.1  (Dunford  and  Schwartz):  Let  T  be  completely  diagonalizable 
with  eigenvalues  \  and  corresponding  projections  Pt.  If  X  is  an  eigenvalue  of 
T+E,  then  for  some  i 
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I X'  -  X*  |  *4‘Xiux||P<||  \\E\\  . 

In  other  words,  the  eigenvalues  X'  of  T+E  lie  in  circles  of  radius 
4  max 1 1  P,  1 1  1 1  2? 1 1  centered  at  the  eigenvalues  of  T.  From  this  result,  it  is 

easy  to  derive  the  following  lower  bound  on  dissg ,  <7Z): 

Corollary  4.2: 

min  jXj  -  Xg| 

•  •"> *  squall 

Proof:  This  condition  assures  that  no  circle  around  any  X^Oj  can  intersect 
any  circle  around  some  XgCo8.  Q.E.O. 

The  Bauer-Fike  Theorem  has  similar  assumptions  and  conclusions: 

Theorem  4.3  (Bauer  and  Flke):  Let  T  be  completely  diagonalizable  with 
eigenvalues  A*  and  diagonalizing  similarity  S.  If  X'  is  an  eigenvalue  of  T +E. 
then  for  some  i 

I X*  —  X*  |  <  inf  k(S)  ||  E\\ 
where  the  inf  is  over  all  diagonalizing  similarities  S. 

This  theorem  yield  a  lower  bound  on  dissg(oi ,  oj)  in  the  same  way  as 
Theorem  4.1: 

Corollary  4.4: 

min  |Xj  -  Xg| 
di,s»(<"  •  •*> 1  2"  Lnf  k(s) 

5 

Proof:  Analogous  to  Corollary  4.2. 

We  claim  these  two  theorems  are  nearly  equivalent  because  Chapter  3 
showed  that 

max||Pt||  <  inf  k{S)  <  dim(F)  •  max||  ||  . 

4  J  4 


(3.8) 
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so  that  the  expressions  in  the  denominators  of  the  corollaries  cannot  differ 
by  more  than  a  factor  of  4  dim(7’).  Furthermore,  the  Bauer-Fike  result  is 
easily  derivable  by  applying  Gerschgorin’s  Theorem  [Isaacson]  to  the  matrix 

STS'1  +  SES~ 1  =  diag(Aj  +  SES'1  . 

The  drawbacks  of  these  simple  lower  bounds  on  diss2(oi ,  ffz)  are  as  fol¬ 
lows.  First,  they  assume  that  T  is  completely  diagonalizable,  and  so  do  not 
apply  to  defective  matrices.  Second,  even  if  T  is  diagonalizable,  the 
max ||  Pi  ||  term  may  be  too  large  and  so  the  lower  bound  too  small  because 
of  nearly  equal  eigenvalues  in  some  irrelevant  part  of  the  spectrum.  For 
example,  by  making  rj  small  in 

100  .01 

IOO+17 

1 

0 

(where  Oi=[0]  and  o8={100.100+ij,lj),  we  can  make  the  lower  bound  on 
diss2(o,  ,  e'a)  as  small  as  desired,  whereas  diss^o, ,  a^)  actually  equals  .5  (we 
will  prove  this  later;  the  best  E  is  nonzero  in  the  lower  right  2  by  2  comer 
only). 

The  heretofore  best  lower  bound  on  diss2(ff| ,  a2)  is  due  to  Stewart 
[Stewart]: 

Theorem  4.5  (Stewart):  Assume  without  loss  of  generality  that  T  is  of  the 
form 


with  at=9(i4)  and  <78  =  a(B).  Let  P  be  the  projection  corresponding  either  to 
O]  or  Og.  Then 
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dissgfr,  .  <Tg)  *  •  (4.2) 

Proof:  The  proof  seeks  a  unitary  similarity  of  a  special  form  which  returns 
T+E  to  block  triangular  form.  The  nontrivial  part  of  this  similarity  satisfies  a 
matrix  Ricatti  equation  which  can  be  solved  by  an  iteration  which  is  a  con¬ 
traction  as  long  as  ||  £'||  is  smaller  than  the  expression  on  the  right  band 
side  of  (4.2).  The  details  of  the  proof  are  not  needed  in  this  thesis;  see 
Stewart  [Stewart]  for  more  information. 

Actually,  Stewart's  proof  only  appears  to  show  that  sep(.4.2?)/  (4j|  P||  )  is 
a  lower  bound  on  diss2(oi ,  o2  ,  path).  It  will  follow,  however,  from  the  com¬ 
ments  following  theorem  4.6  below  that  it  really  is  a  lower  bound  on 
(11352(0!  ,  e2  .  region). 

Stewart's  bound  is  generally  much  tighter  than  the  bounds  of  Dunford- 
Schwarz  or  Bauer-Fike.  We  are  pleased,  therefore,  to  have  found  the  improve¬ 
ment  in  the  next  section. 

4.3  A  New  Lower  Bound  on  diss2(o, ,  o2) 

Our  improvement  of  Theorem  4.5  is  based  on  an  approach  used  by  Varga 
and  Feingold  [Varga]  and  more  recently  Meyer  and  Veselic  [Meyer]  to  prove  a 
block  version  of  Gerschgorin's  theorem. 

Theorem  4.0:  Let  T,  A.  B,  and  C  be  as  in  Theorem  4.5.  Then 


diss2(oi  ,  o2)  5t 


wpxU.B) 

11*11  +  ^11  *11  7  ~  1 


Proof:  If  A  is  an  eigenvalue  of  T+E  but  not  of  T  then 


(4.3) 


0  =  det(A -T-E)  -  det(A-r)  det(/-(A -T)-'E)  =  det(/-(A -T)~'E) 
implying  that 


l*||(A-7’)-l£||  *|J(A-r)-‘||  II  £11 

Now  we  choose  a  block  diagonalizing  similarity  S  as  suggested  in  (3.20)  so 
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that  S~*TS  =  diag(A.B)-  (Note  that  the  other  5  suggested  in  section  3.3 
•would  yield  diag^.i?')  where  B'  is  similar  to  B  but  could  be  otherwise  much 
different.)  Thus 

1  s  ||  S'1  diag((X— A)-1  .  (X-5)'1)  5||  •  ||  E\\ 

&  k(S)  max  (||  (X— A)~*||  .  ||  (X-^)-'||  )  II  E\\ 
or 


k(S)\\  E ||  *  min  (||  (X-yl)-MI  .  II  (\-B)~'\\ “*)  . 

Just  as  the  inequalities  of  Theorems  4.1  and  4.3  show  that  the  eigenvalues  of 

T+E  lie  in  circles  centered  at  the  eigenvalues  of  T,  this  last  inequality  shows 

that  the  eigenvalues  of  T+E  lie  in  certain  regions  around  the  eigenvalues  of 

7*.  And  just  as  in  Corollaries  4.2  and  4.4,  we  can  derive  a  bound  on  ||  £,||  such 

that  the  regions  belonging  to  a(A)  and  the  regions  belonging  to  a(B)  remain 

disjoint. 

Indeed,  as  long  the  region 

k(S)\\E))  =  ll(X-A)~MI 

remains  disjoint  from  the  analogous  region  for  B,  ||£,||  ^ss^O!  ,  a2).  Ima¬ 
gining  these  regions  as  functions  of  |j  E\\ ,  there  is  r  smallest  ||  £"||  for  which 
these  regions  can  intersect,  which  means  there  is  a  X'  such  that 

*s)\\r\\  =  n  (x--A)-l|l  "*  =  II (v-^)-1!!  **  • 

From  the  definition  of  sep*.  it  is  clear  that  sepx(4 ,B)  is  less  than  or  equal  to 
both  ||  (X'-j4)-,|I  -l  and  ||  (X'-i?)-,||  ~l-  Thus. 

*(‘5)||  £"H  as  sep k(A,B) 
or,  substituting  the  value  of  jc(5)  and  rearranging 


- 

11*11  +  V||P||S-i 

Since  ||£"||  is  the  smallest  value  for  which  the  regions  can  possibly  inter¬ 


sect,  the  proof  is  complete.  Q.E.D. 
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Lemma  2. 10,  plus  the  inequality 


Il-Pll  +  V||P||z-l-;2  IIFjl 

show  that  the  bound  of  Theorem  4.6  is  always  larger  than  the  bound  of 
Theorem  4.5.  The  two  bounds  can  be  significantly  different  because  sep  can 
be  much  smaller  than  sep*.  Just  how  much  smaller  is  the  subject  of  the  next 
chapter,  but  we  present  an  example  to  illustrate  typical  behavior.  If 


A  = 


e  1 


and  B  =  -A 


are  both  n  by  n  matrices,  then 


aep(A.B)  =  and  sepxCA.ff)  =  9(en) 

(the  notation  /  =9{g)  means  /  is  exactly  of  order  g:  /  =0(g)  and  g=0(/)). 

We  discuss  this  example  in  detail  in  chapter  5.  Experience  in  constructing 

examples  like  this  led  us  to  our  conjecture  in  chapter  8  that  even  though  sep 

and  sep*  are  almost  always  close,  when  they  are  far  apart  they  can  differ  by 

at  most  a  square  as  in  the  example. 


An  interesting  corollary  of  this  theorem  holds  when  T  is  block  diagonal: 
Corollary  4.7:  Suppose  T  is  block  diagonal: 

T-\ 

Then 


diasa(a,  .  o2)  -  sep X{A,B) 

VS  sepx(A,B)  dissf(ci  ,  o2)  i  sep*(,4,i?) 
In  particular,  if  A  =$4  and  2?=®2?<  then 


diss8{(71 .  <7g)  =  min  sepx(Aj  .  Bt) 


(4.4) 

(4.5) 


(4.6) 
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Proof:  j|/’||  is  clearly  1.  Thus,  the  lower  bound  in  Theorem  4.5  equals  the 
upper  bound  in  Theorem  2.8,  proving  (4.4).  (4.5)  follows  similarly.  (4.6)  fol¬ 
lows  from  Lemma  2.12.  Q.E.D. 

Thus,  dissz(ai  ,  az)  satisfies  the  same  divide  and  conquer  paradigm  as  do 
sep  (Lemma  2.6)  and  sepx  (Lemma  2.12).  In  particular,  there  is  a  smallest 
perturbation  (measured  with  ||  ||  and  not  ||  ||^)  with  the  same  block  diago¬ 
nal  structure  as  T.  This  proves  the  claim  made  about  the  T  in  (4. 1),  since  it 
is  block  diagonal. 

4.4  When  is  the  Lower  Bound  a  Good  Estimate? 

In  this  section  we  discuss  when  the  lower  bound  of  Theorem  4.6  is  likely 
to  be  sharp.  We  have  already  seen  that  it  is  sharp  for  block  diagonal 
matrices.  We  will  show  that  it  is  also  sharp  for  2  by  2  matrices,  and  nearly  so 
when  A  is  1  by  1  and  B  is  diagonal  (we  will  need  this  special  case  in  chapter  8 
on  probabilistic  bounds).  We  then  present  two  examples  when  the  lower 
bound  is  much  too  low;  in  the  first  example  A  is  1  by  1  and  B  is  a  Jordan 
block,  and  in  the  second  both  A  and  B  are  2  by  2  d> agonal  matrices.  This 
second  example  leads  to  a  "combinatorial"  improvement  on  our  new  lower 
bound. 

Lemma  4.B:  If 


is  2  by  2,  then 

diss,(a,  .  Pg)  =  dissz((7l  .  Pg)  = 

_ _ 

2(|c|  4  Vjc|8+  |6-a|s) 

Furthermore,  this  last  expressions  also  equals  both  dissg(pj  ,  Pg  ,  path)  and 
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diss£ (cxi  ,  a2  .  path). 

Proof:  Choose  u  on  the  unit  circle  so  that  uc/(b  -o)  is  real  and  nonnegative. 
Then  choose  d  so  that 


cot  d/2  =  -22—  . 

o  —ti 


Let 


p  =  ||  -Pi!  =  Vi  +  I  c  |  V 1 6  • 
It  is  easy  to  see  that  the  matrix 


Q  = 


leosd  Esind 
sind  Bcosd 


is  unitary,  and  that 


r  =  QTQ*  = 


a+b 

2 


o+b 


[2(p+vpri)  2 

where  represents  a  complicated  expression  that  is  not  important.  Clearly, 
by  changing  the  lower  left  entry  of  T'  to  0  we  will  change  T  to  a  matrix  with  a 
double  eigenvalue  at  (a+b)/2.  Both  the  two  norm  and  Frobenius  norm  of  this 
rank  one  perturbation  are  equal  to  the  lower  bound  in  Theorem  4.6.  A  little 
manipulation  yields  the  expression  in  the  statement  of  the  lemma.  This 
proves  that  the  expression  in  the  statement  of  the  lemma  is  equal  to  both 
diss^O]  ,  a2  ,  path)  and  dissf  (oj ,  a2  •  path).  Since 

diss(ai  ,  a2  ,  pafh.)idiss(ff1 ,  a2  ,  rtgian),  the  lemma  follows.  Q.E.D. 

The  next  example  of  the  lower  bound  being  nearly  sharp  is  the  matrix  T 
with  a  1  by  1  block  A=[a ],  an  n  by  n  diagonal  block  2?sdiag(bt),  and  a  1  by  n 
block  C=[ct . c*]: 
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a  ct  •  cn 
r 

T  = 

V 

We  need  to  use  this  example  in  chapter  8  when  we  show  that  our  new  lower 
bound  is  likely  to  be  sharp. 


The  geometry  of  this  example  is  simple.  sepx(X,5)=min|  a—  6*  (/  2;  say 
the  minimum  occurs  at  i=i§.  ^Ii  P\\  2-l=max|  c*/  (ot  — i>t)  | ;  say  the  max 
occurs  at  i=ip.  |  a  -fe<5  j  gives  the  distance  from  a  to  the  closest  eigenvalue 
of  B.  and  ||  P\\  gives  the  maximum  instantaneous  speed  at  which  a  can  move 
under  perturbations  in  T  [Kato2].  Therefore  it  makes  sense  that  the  smal¬ 
lest  perturbation  needed  to  make  a  hit  an  eigenvalue  of  B  be  approximately 
distance/speed  =  |o— bis\/ ||  F||  =  sepx/ (2- 1|  P|j ).  The  algebra  is  quite 
messy,  but  the  proof  is  similar  to  that  of  lemma  4.8:  pick  a  unitary  matrix  Q 


cost?  Esind 
[surd  Ecosd 


(d  is  a  real  angle  and  |  <u|  =1)  so  that 


1  Nf'* 

6  a+big-big 

represents  a  complicated  expression  that  is  not  important,  and  6  is  the  per¬ 


turbation  which  makes  the  eigenvalue  a  move  to  6^.  The  exact  formula  for 
1 6 1  is  rather  complicated,  but  it  is  easy  to  see  that  when  a  is  much  closer  to 
than  bip,  1 <5 1  cannot  exceed  the  lower  bound  in  theorem  4.6  by  more  than 


a  small  factor. 


From  lemmas  2.1  and  2.3,  we  see  that  if  B  can  be  diagonalized  by  a  simi¬ 
larity  of  condition  number  c,  then  the  lower  bound  can  not  be  too  low  by 
more  than  a  factor  of  k2.  Thus,  it  should  come  as  no  surprise  that  our  first 
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example  of  when  the  lower  bound  is  not  sharp  has  A  1  by  1  and  B  an  n  by  n 
Jordan  block,  since  a  Jordan  block  is  the  "least  diagonalizeable"  matrix  of  all: 


10  1 

e  1 


r  = 


(4.7) 


l  «J 

We  will  see  that  the  upper  bound  sepx(^4,5)=9(eB),  the  lower  bound  =0(e2n). 
and  diss2(ff1  ,  <x2)=0(en+l):  thus,  neither  the  upper  nor  the  lower  bound  is 
asymptotically  correct,  but  the  upper  bound  is  a  much  better  estimate  than 
the  lower  bound  for  large  n.  The  proof  that  diss^Oi  .  o2)=0(rn4‘1)  will  follow 
from  considering  perturbations  only  in  the  lower  left  corner,  and  the  proof 
that  diss^o,  ,  o2)=Q(en+1)  (i.e.  dissect  ,  o2)  is  decreases  no  more  quickly 
than  en+1)  will  follow  from  analyzing  the  characteristic  polynomial  of  T+E. 
where  E  is  a  general  perturbation. 

That  sep)JiA.B)-B(en)  follows  from  Lemma  2.11.  Similarly,  it  is  easy  to 
see  that  |[  P\\  =0(e~")  so  that  our  lower  bound  on  di ss2(o,  ,  ae)  is  ©(r2"). 


To  estimate  diss2(a1  ,  ag),  we  consider  perturbations  in  the  lower  left 
corner.  A  simple  computation  shows  that  if  we  change  r„* j.i  from  0  to 


-1 

1  1 

n 

n+1 

n+l 

,»+i 


then  the  0  eigenvalue  and  one  of  the  eigenvalues  at  e  coalesce  at  e/  (n  +  1).  To 
shew  that  dissgCffj ,  02)  cannot  be  of  order  c*  for  x>n+l  we  consider  the 
characteristic  polynomial  of  T+E-e*,  where  ||  E\\  =0(1).  If  E- 0  the  charac- 
teristic  polynomial  is 


det(Xf-T)  =  t  ETTGlH-eV  s  *!£■»<*)* 

JwQ  \n~7  I'J  !  jmQ 

where  <ij(e)  is  a  polynomial  in  e  with  lowest  order  (dominating)  term  e1**1-* 
for  j>  1  and  a0=0.  It  is  easy  to  see  that 
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det (\J-T-E)  =  Y'  ■■<(*)** 

/•o 

where  a'j(e)  has  the  same  dominating  term  as  o,(r)  for  j»l  and  a  0 
By  changing  variables  to  u=A/e  the  charactertstic  polynomial  becomes 

det(A/-7*-£)  =  e"*‘  -•  m 

where  p(0)=0  and  its  remaining  coefficients  are  0(  1 J  Oe  •  f  x  >*  *  1  then 

the  eigenvalue  at  A=0  remains  isolated  from  the  eigenvalue*  at  **»  (u  *  1)  by 
the  continuity  of  the  roots  of  a  polynomial  as  functions  of  the  coefficients  If 
x=n  +  l  then  this  argument  breaks  down  and  indeed  we  haw  displayed  a  per¬ 
turbation  of  that  magnitude  that  makes  eigenvalues  coalesce 

The  next  kind  of  example  that  shows  that  the  lower  bound  of  theorem 
4.6  can  be  low  depends  on  dim(v4)  and  dim (B)  both  being  at  least  2.  The  idea 
is  that  sepx  will  be  small  and  ||  P\\  large  because  of  nonoverlapping  parts  of 
the  spectra  of  A  and  B.  In  other  words,  sepx  will  be  determined  by  A,(j4)  and 
Ai (B),  and  ||  P\\  by  Aa(4)  and  A 8(f?).  It  will  turn  out  the  diss*(o,  .  o2)  will  be 
the  smallest  perturbation  that  either  makes  Ai(.<4)  coalesce  with  A,(f?)  or 
Ae(/4)  coalesce  with  A8(F).  This  example  will  lead  to  a  systematic  improve¬ 
ment  on  theorem  4.6  obtained  by  considering  all  possible  partitions  of  ot  and 
oe;  it  illustrates  the  combinatorial  complexity  of  the  problem. 

Let 

-M 

Simple  computations  yield  sepx(i4.i?)=e/2  and  ||  P\\  =  V(l+e)/e,  providing 
a  lower  bound  on  dissg((7| ,  og)  of  9(ev*)  and  an  upper  bound  of  e/2.  We  will 
show  that  the  upper  bound  is  a  much  better  estimate  of  diss^?! ,  o8). 
Clearly,  to  make  Oi={— 1  ,  1}  coalesce  with  c8=|l+e  ,  -1+VeJ,  either  }lj  has 


T  = 


1  0 
-1  1 
1+e 

-lWe 
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to  coalesce  with  $-1  ,  -1+Vc  ,  1+eJ,  or  J-1J  has  to  coalesce  with 
{1 .  1+e  ,  -1+Vcj.  Thus 

diasa(ffi .  o2)  ^  min  (dissg($l{ ,  {-1 ,  -1+Ve  ,  1+cJ) .  dissg(j-l{  .  {1  ,  1+e  ,  -1+Ve})) 
=  (2"’  2(1+VI+I))  =  2(l+VT+i)  ’ 

In  other  words,  diss2((r,  ,  o2)  is  determined  by  the  size  of  the  smallest  pertur¬ 
bation  that  makes  —1  coalesce  with  —  1+v/e;  the  rest  of  the  spectrum  is 
irrelevant. 

In  this  example  we  used  the  fact  that  the  lower  bound  of  theorem  4.6  is 
exact  for  2  by  2  matrices,  but  this  was  not  necessary.  In  fact,  if 
Ei  a  Jan,..,ffijJ  is  any  partition  of  0\.  the  above  argument  shows  that 

dissgCff! .  ©a)  min  dissgCo^  , 

since  some  eigenvalue  from  some  ou  must  coalesce  with  something  in  its 
complement  a-a If  we  consider  all  possible  partitions  E,  of  Oi  (including 
the  trivial  partition  ja^)  and  similarly  all  partitions  Eg  of  oz  we  obtain  the  fol¬ 
lowing  equality: 

Theorem  4.9:  Let  a *  and  E*  be  as  above  for  i  =  l,2.  Then 

dissect ,  o2)  =  max  (  max  min  diss2(aH  ,  o-cu) ,  max  min  dissz(a-agj  .  ffg^) ) 

Proof:  That  the  right  hand  side  is  no  greater  than  the  left  hand  side  follows 
from  the  previous  discussion.  Equality  must  hold  because  diss2(a1  .  <x8)  is  one 
of  the  candidates  of  the  maximum  on  the  right  Q.E.D. 

If  we  substitute  the  lower  bound  of  theorem  4.6  for  each  dissg  expression 
on  the  right  hand  side  of  the  last  equation,  we  obtain  an  improvement  of  the 
theorem,  as  illustrated  by  the  last  example. 
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Chapter  5:  How  Far  Apart  can  the  Upper  and  Lower  Bounds  an  diss2(a, ,  cr2) 
Be? 

5. 1  Introduction 

Let  us  summarize  the  upper  and  lower  bounds  already  proven  in 
theorems  2.10  and  4.6  (we  assume  T  has  the  structure  shown  in  (2.6),  with 
Oi=<j(A)  and  a2=a(B)): 

Theorem  5.1: 


sep \{A.B)  diss2(a!  .  o2  ,  path) 

%  diss2((71  ,  o2  .  region )  i 


(5.1) 


sepx(A.B) 


pii  wir^n8-! 


and 


V2  sep x(A,B)  fe  diss£(a1  .  <j2  .  path)  (5.2) 

z  .  .  sepx(A.F) 

sC  dissjp (a,  ,  Oo  ,  regxon)  «fc  - - 7=====-  . 

||P||  +  ^11  P\\  S  -  1 

In  this  chapter  we  will  analyze  how  far  apart  these  bounds  can  be.  We  will 
present  only  global  bounds,  valid  for  all  matrices  and  depending  only  on  the 
dimensionality.  Probabalistic  bounds,  which  show  when  the  upper  and  lower 
bounds  are  likely  to  be  close  together  or  far  apart,  are  presented  in  chapter 
8. 


Our  worst  case  analysis  starts  by  substituting  the  upper  bound  for  1 1  P 1 1 
of  lemma  2.5  in  inequality  (5.2)  ((5.1)  is  so  similar  to  (5.2)  that  we  will  not 
consider  it  further): 


V2  sepx(A,£)  diss *(a, .  <7g) 


sep x(A,B) 


1+ 


IICHt 

sep(j4,£?) 


sepx(i4.P)sepU,P) 
sepU.B)  *2-||C||* 


(5.3) 
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nepx(A,B)sep(A.B) 

**v{A.B)  +  Z-\\T\\n  ' 

We  normalize  by  taking  |j  7’||x=l  so  that  dissjj(<j,  ,  ca)  actually  measures  the 
relative  change  in  T.  With  this  normalization  sep<2  and  sepx^l.  so  the 
denominator  of  the  last  right  hand  side  must  lie  in  the  interval  [2,4]  and  is 
therefore  not  important: 

Corollary  5.2: 

VS  sepxU.fi)  *  diss*(ai  .  o8)  >  sepxU.fi)  sep(-4,fi)/4  .  (5.4) 

For  this  coarsening  of  (5.2)  to  be  realistic,  )|  fi||  must  be  near  its  largest  pos¬ 
sible  value  l+|j  r||  g/ sep.  |]  P\\  can  fail  to  be  near  its  bound  because  ||  C |]  g 
is  much  less  than  |)  TH*;  |j  C\\  g  is  a  kind  of  generalized  measure  of  nonnor¬ 
mality  of  T  with  respect  to  the  partitioning  [Oj.ffzJ.  and  contributes  to  the 
bound  in  the  simple  way  shown  above.  The  way  in  which  the  upper  and  lower 
bound  can  differ  greatly  is  for  sep  to  be  small.  We  know  from  lemma  2.12 
that  2-sepx  is  an  upper  bound  on  sep;  the  question  this  chapter  asks  is  how 
much  smaller  can  sep  be  than  sepx? 

The  results  of  this  chapter  are  as  follows  (in  this  chapter  we  will  make 
the  convention  that  nA  a  dimU)idim(fi)  a  ng).  We  show  that  sep  can  be  no 
smaller  than  a  constant  multiple  of  sep**.  This  means  that  the  lower  bound 
in  (5.2)  can  be  no  smaller  than  a  constant  multiple  of  the  dimU)+-l-st  power 
of  the  upper  bound.  We  present  examples  where  the  lower  bound  is  actually 
the  cube  of  the  upper  bound,  and  other  examples  where  it  is  the  square  and 
either  the  upper  or  lower  bound  may  be  the  more  accurate  measure  of 
dis8j(<7]  ,  02  ,  path).  Since  |j  fi||  provides  an  upper  bound  on  sep  (from 
lemma  2.5),  we  may  translate  our  results  into  upper  bounds  on 
diisj(<7i ,  o8  >  fiofh)  depending  on  j|  P|| .  These  results,  though  necessarily 
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weaker  than  the  results  depending  on  sep  and  sepx,  reproduce  results  found 
in  the  literature. 

The  remainder  of  this  chapter  is  organized  as  follows.  Section  5.2  sur¬ 
veys  historical  results.  Section  5.3  handles  the  case  dim(.A)=l  which  works 
out  especially  simply.  Section  5.4  discusses  the  general  case  dim(yl)&2. 
Finally,  in  section  5.5,  we  compute  dissafffj  ,  Og)  and  dissj-(c,  ,  Og)  exactly  for 
normal  matrices,  in  which  case  the  path  and  region  dissociations  coincide. 

5.2  Surrey  of  the  literature 

As  stated  in  the  introduction,  the  literature  has  concentrated  on  using 
||  P\\  for  upper  bounds  on  dissgCffj  ,  o2  ,  path)  and  diss£(a1  ,  a2  .  path).  Here 
we  mention  three  previous  works,  by  Ruhe  [Ruhel],  Wilkinson  [Wilkinson3], 
and  Kahan  [Kahanl].  Since  Kahan's  result,  stated  below,  essentially  implies 
the  other  two  results,  we  discuss  it  first. 

Kahan  proves  the  following  theorem; 

Theorem  (Kahan):  If  ||  P||  >y/nA  + 1  then 

diss2(al .  az  ,  path.)  1.22 

I!  r||  s  ■ 

We  prove  a  stronger  result  in  section  5.4,  essentially  replacing  ||P||  by  an 
upper  bound  depending  on  sep.  Kahan's  proof,  which  is  totally  different  than 
ours,  could  however  be  modified  to  use  sep  instead  of  ||P|| .  This  modified 
proof  yields  the  insight  that  if  the  R  attaining  the  infimum  in  definition  2.4  of 
sep  has  well  separated  singular  values  (i.e.  some  near  ||P||,  the  rest  near 
zero),  then  the  exponent  -1/nA  of  ||P||  appearing  in  the  bound  can  be 
replaced  by  -1,  but  we  do  not  pursue  this  approach  further,  since  it  would 
lead  to  a  probabilistic  analysis  similar  to  that  of  chapters  6  and  7. 
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Wilkinson's  result  [Wilkinson3]  only  covers  the  case  n4=l  and  is  essen¬ 
tially  the  same  as  Kahan's  result.  Ruhe's  bound  is  on  the  distance  from  a 
given  diagonalizable  matrix  to  the  set  P  of  matrices  with  at  least  one  multi¬ 
ple  eigenvalue,  and,  by  an  abuse  of  notation,  may  be  written 


max  lx*-** | 

dist*(7\P)  £  ~  -  .  t  .tz  -  . 

4  y/(max 

and  is  weaker  than  the  previous  two  results  in  that  it  has  a  higher  root  of 
IIP  1 1  in  the  denominator.  It  does,  however,  explicitly  depend  on  the 
difference  of  eigenvalues,  which  may  be  small  even  if  ||  P|j  is  not  large.  This 
advantage  is  shared  by  VSsepx  as  an  upper  bound. 

5.3  The  Case  dim(A)=l 


This  case  is  particularly  simple;  diss^ (ffj  ,  a2)  is  the  dissociation  between 
a  simple  eigenvalue  and  the  rest  of  the  spectrum.  From  lemma  2.11  we  know 
that  sep  and  sepx  cannot  differ  by  more  than  a  factor  of  2: 


sepxU.P)  i  sep04.27)  =  •  /)  s  2  sepx(.A.J9) 

so  that  inequality  (5.4)  becomes 

V2  a  mini  B -a  ■  I)  2  diss^ff, ,  oz)  s»  -jg-crjLn(£-,o  /) 

Thus,  the  lower  bound  can  behave  at  worst  like  the  square  of  the  upper 
bound  (recall  that  ||  r||  g=  1  so  that  all  bounds  are  on  the  relative  error).  The 
distance  between  these  bounds  cannot  be  decreased,  as  two  examples  of  the 
last  chapter  have  shown.  Lemma  4.8  shows  that  for  a  2  by  2  matrix  the  lower 
bound  in  (5.2)  is  sharp.  On  the  other  hand,  the  example  in  equation  (4.7), 
which  also  had  a  two  point  spectrum,  shows  that  the  upper  bound  in  the  last 
equation  is  more  accurate  for  a  case  where  the  upper  and  lower  bounds 
differ  by  a  square.  Thus,  we  cannot  hope  to  improve  the  bounds  in  (5.2)  much 
if  we  only  use  measures  like  sep*.  ||  P||  and  similar  global  measures.  We  will 
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see  in  chapter  8,  though,  that  the  lower  bound  of  (5.2)  will  be  accurate  unless 
T  falls  into  a  set  of  small  probability. 

Since  lemma  2.5  shows  that  jj  Pj|  provides  a  lower  bound  on  sep  and 
hence  sepv  we  can  change  the  upper  bounds  in  (5.1)  and  (5.2)  to  upper 
bounds  in  terms  of  1 1  P  j  | : 

Lemma  5.3:  Let  dim(.4)=l.  Then 

vprfsrf*  •  *"***> 

^y1|T^  diss2(ffi .  a2  . path) 

Proof:  Follows  immediately  from  lemma  2.5,  theorem  2.9,  and  lemma  2.11. 
Q.E.D. 

This  yields  the  results  of  Kahan  [Kahanl]  and  Wilkinson  [Wilkinson3]. 

5.4  The  Case  dim(.A):s2 


The  goal  of  this  section  is  to  prove 


Theorem  5.4:  Let  T  have  the  structure  of  (2.6),  and  assume  ])  T\\  j=l.  Then 


2  •  sepi(A,B)  sep(A.B)  2; 


sep  k(A,B) 


nA  (1  +  sep^M.tf))"1''1  nA 


sepx(A.B) 


In  other  words,  sep  can  not  be  any  smaller  than  some  constant  multiple  of 


sep"1.  After  proving  this,  we  will  give  an  example  showing  that  sep  can  indeed 
be  as  small  as  sep|.  We  believe  that  this  is  worst  case  behavior,  but  have  not 


been  able  to  prove  it. 


Proof:  The  first  inequality  in  the  theorem  is  just  lemma  2.11.  The  proof  of  the 
other  inequalities  are  very  simple  given  the  expression  sep(A.B)=||  't'jjjijl  “l 
from  definition  2.4.  This  expression  means  that  an  upper  bound  on 
provides  a  lower  bound  on  sep(A,5).  To  compute  such  an  upper  bound,  con- 
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aider  the  following  form  of  ^ba  *or  nA  =3>  which  is  analogous  to  the  expres¬ 
sion  in  (2.9): 


Thus, 


Iff— an  /  — azl  I 


*BA  = 


B  —a  22  I 


-an  I 
-a  sb-/' 


5-033- /] 


(5-an-/)-1 


*BA= 


aai(5-an  /)-1(5 -a.32  I)~l 
(5-oa  /)-» 


W^B-a^ D-KB-an  iy^B-a^ I)-l+a31(B-an ■/)~l(B-asa- f)~l 

&at(B—i B22'/)-1(5— Oss-/)-1 
(5-038/)'* 

The  largest  number  of  (5-a*  7)-1  terms  that  are  multiplied  together  is 
three,  and  they  appear  in  the  upper  right  corner  of  ♦ ba ■  Similarly,  for  any 
n4.  the  largest  product  of  (B -a*  /)-1  terms  contains  nA  of  them  and  occurs 
in  the  upper  right  corner  of  *§)*•  Now  since  ||  (5-aft  7)_1!|  _1  is  clearly  an 
upper  bound  for  sepx  for  any  i,  ||  (5-a*  7)-1||  is  a  lower  bound  for  sepx  1  and 
so  the  norms  of  the  block  entries  of  "ijU  bounded  above  by  sep*""4  times 
a  constant.  Thus  ||  ♦5*1 1|  is  itself  bounded  above  by  sep^"4  times  some  con¬ 
stant  c*"1  depending  only  on  n4 .  Taking  reciprocals,  we  see  that 


sep  =  ||  *f*i||  '*  i  c^  sepT4 

as  desired. 


More  precisely,  we  use  the  block  matrix  norm 

II  *11*  *  maxi  £  11%  II 
i 

where  %  is  a  square  subblock  of  if.  If  there  are  n4  blocks  %  in  any  row  or 
column  of  if,  then  it  is  straightforward  to  show  that 


11*11  *  nA  '  (1*11* 
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For  example,  when  nA  =  di m(M).  then  ||  M\\ *  is  the  usual  infinity  norm  (max¬ 
imum  absolute  row  sum  norm)  of  M  ■ 

Now  we  estimate  ||  '{'sii  II  •  Since  1  is  a  common  upper  bound  for  the 
magnitudes  of  the  off  diagonal  elements  of  A,  the  sum  of  the  upper  bounds  of 
the  norms  of  the  blocks  in  the  first  row  of  is 

11**2.11  •  ll**2.IU 

nj-2 

<nA  (  sep*1  +  L  se P a®  (*  +  sep*-1)*  ) 

i*0 

=  nA  (  sep* 1  (1  +  sep*-1)”4"*  ) 

<  nA  2n'*“1  sepx~"4 

(since  sep*  <  1).  This  is  clearly  also  an  upper  bound  on  the  sum  of  norms  of 
the  blocks  in  the  other  rows  too.  Q.E.D. 

Since  we  are  intertested  in  the  case  when  sep*  is  very  small,  bounding  it 
by  1  as  we  did  in  the  last  equation  is  rather  conservative.  A  more  realistic 
bound,  true  asymptotically  for  small  sep*.  may  be  derived  as  follows:  from 

the  expression  for  'ig]A<  we  see  that  the  coefficient  for  the  ]Q  (B—  a*  7)-1 

n,-i 

(sep*-*4)  term  is  a,«.u.  Since  the  ay  satisfy  ^loyl^l  we  see  that  the 

««l  a 

last  product  can  be  at  most  (n^-l)-^*4’1^2,  implying  that  for  small  sepx,  sep 
is  approximately  bounded  below  by  (n4-l)*"4_1)/2  sep"4. 

As  in  the  dim{A)=l  case,  knowledge  of  |j  F]|  provides  a  lower  bound  on 
sep  which  in  turn  provides  an  upper  bound  on  diss^ (oj  ,  oe  •  path). 

dissjfjj ,  az  ,  path)  <  VS  sep*  s  VS  ( nA  2"4”1  sep  )I/n4 
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l/nA 


W  3.40(  ■ 


Thus,  we  have  an  upper  bound  on  dissj(a1 


9 


to  the  nA  -th  root  of  1/  | )  P  | 


l/nA 

,  (Tg  ,  path)  essentially  proportional 


Theorem  5.4  improves  results  of  Kahan  [Kahanl],  Ruhe  [Ruhel],  and  Wil¬ 
kinson  [Wilkinson3]. 


Now  we  present  an  example  to  show  that  sep  can  indeed  be  as  small  as 
sep£.  Consider  the  n  by  n  matrices 


and  B  =  ~AT  . 


I  eJ 

A  simple  computation  shows  that  the  upper  right  corner  of  (the  largest 
element)  is 


Cl-Zn 

2V7r(n-l) 

(by  Stirling's  formula)  which  means  sep  is  bounded  below  by  the  reciprocal 
of  this  quantity  and  above  by  n8  times  the  reciprocal.  Symmetry  considera¬ 
tions  show  that  \=0  is  the  value  which  attains  the  minimum  in  the  definition 
of  sep*.  Since  the  largest  entry  in  A~l  is 

we  see  that  sep*  is  bounded  below  by  eB  and  above  by  nen.  Thus,  for  fixed  n 
and  e  approaching  zero,  we  see  that  sep  is  bounded  below  by  sep*. 
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5.5  diss^ffj .  o2)  and  diss*(a,  .  a2)  for  Normal  Matrices 

For  normal  matrices  it  turns  out  that  we  can  compute  diss^ffi  .  <x2)  and 
diss*(ci  .  <x2)  exactly.  This  is  because  a  matrix  is  normal  if  and  only  if  if  can 
be  diagonalized  by  a  unitary  similarity.  Thus,  the  upper  triangular  form  of  T 
that  has  been  our  starting  point  is  actually  diagonal,  and  all  projectors  are 
orthogonal  and  hence  of  norm  1.  Thus,  the  upper  bound  and  lower  bound  we 
have  been  comparing  in  this  chapter  are  e~ual  and  we  have 

Theorem  5.5:  If  T  is  normal,  then 

diss2(o1 .  ff2  .  path)  =  diss2(ffi  .  o2  .  region) 
s  dissf  (<7i ,  o2  ,  path)  =  diss^(fft .  <x2  ,  region) 

min  |  X2  —  X2 ! 

_  _ 

2 

Proof:  The  expressions  for  diss^^  .  o2  ,  region)  and  diss2(aj  .  a2  .  path)  follow 
from  the  discussion  of  the  previous  paragraph.  Now  let  T'  be  the  2  by  2 
diagonal  submatrix  of  T  containing  \'j  and  \'2  on  its  diagonal,  where  \  l  and 
attain  the  minimum  in  the  statement  of  the  theorem.  The  claims  for 
diss*(o,  .  <t2  .  region)  and  diss*(a,  ,  e2  ,  path)  follow  directly  from  applying 
lemma  4.8  to  T'.  Q.E.D. 

The  perturbation  of  lemma  4.8  has  several  interesting  properties:  from 
the  construction  of  lemma  4.8,  we  see  that  the  perturbed  matrix  T  is  defec¬ 
tive.  and  hence  not  normal.  Furthermore,  even  if  T  is  real  the  T  may  not  be, 
since  by  perturbing  only  two  eigenvalues,  the  eigenvalues  of  T'  may  very  well 
no  longer  occur  in  complex  conjugate  pairs,  which  means  it  could  not  be 
real.  Furthermore,  no  other  perturbation  of  minimal  Euclidean  norm  can 
yield  a  normal  T'  because  the  Wielandt-Hoffman  theorem  for  normal 
matrices  [Wilkinson2j  shows  that  Jj  T-7”)i  g  must  be  at  least  >/2sepv  We  exhi- 
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bit  such  a  perturbation  in  the  next  paragraph. 

If,  however,  we  measure  perturbations  with  ||  |)  instead  of  })  • ))  g,  we  can 
find  a  minimum  norm  perturbation  yielding  a  T'  which  is  normal,  although  it 
may  not  be  real  if  T  is.  We  accomplish  this  by  using  a  rank  2  perturbation 
instead  of  the  rank  1  perturbation  of  lemma  4.8.  Simply  observe  that  ST  in 


has  2-norm  |a-6  |/2  =  disse(a1 ,  o2).  Euclidean  norm  V2disSf(oi ,  eg).  and 
rank  2.  Applying  this  construction  to  the  two  closest  eigenvalues  \\  and  X'2  of 
o,  and  Og  clearly  produces  a  normal  T\  This  7”  may  not  be  real  if  7  is,  how¬ 
ever.  This  may  occur  if  X't  and  X'e  are  complex  but  not  complex  conjugate 
pairs,  because  then  the  eigenvalues  of  T  will  not  occur  in  complex  conjugate 
pairs,  a  necessary  condition  for  being  real.  Sometimes  we  can  still  find  a  real 
T  if  this  happens:  if  both  X’t  have  nonzero  imaginary  parts,  then  perturb 
their  conjugates  X'i  and  X’g  to  coalesce  also.  Since  the  eigenvector(s)  belong¬ 
ing  to  any  X  is(are)  the  complex  conjugate(s)  of  the  eigenvector(s)  belonging 
to  X,  these  two  rank  2  perturbations  are  easily  seen  to  be  complex  conju¬ 
gates  so  their  sum.  the  total  perturbation,  is  real.  If,  however,  one  of  the  X'( 
is  real,  we  may  not  be  able  to  find  a  real  T':  consider 


with  CjsfOj  and  c2=|i,-i  J.  The  only  way  for  0]  and  c2  to  overlap  and  still  have 
complex  conjugate  pairs  is  for  all  three  eigenvalues  to  be  real.  By  the 
Wielandt-HoffRxan  theorem  [Kato2]  this  requires  a  perturbation  of  Euclidean 
norm  at  least  VZ,  and  hence  a  2-norm  of  at  least  whereas 
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If  T  is  Hermitian,  we  can  say  still  more.  Applying  what  we  said  about  nor¬ 
mal  matrices,  we  see  a  minimum  jj  -Dj-  perturbation  yields  a  defective  and 
hence  nonhermitian  T\  but  T  is  clearly  real  if  T  is.  The  rank  2  perturbation 
above  also  produces  a  Hermitian  T'  which  must  be  real  if  T  is. 
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Chapter  6:  A  Probabilistic  Model 
6. 1  Introduction 

In  the  last  chapter  we  presented  upper  and  lower  bounds  on 
diss^(a1  ,  ff2)  and  examples  which  showed  that  they  could  be  equal  or  arbi¬ 
trarily  far  apart.  We  did  not  provide  any  insight  as  to  when  they  were  likely  to 
be  close  or  distantly  separated.  We  will  provide  this  insight  in  this  chapter 
and  the  next  by  filling  in  the  details  of  the  following  description:  There  is  a 
surface  in  the  space  of  matrices  such  that  our  upper  and  lower  bounds  on 
diss^O]  ,  az)  are  far  apart  only  for  matrices  within  a  small  relative  distance  e 
of  the  surface.  (We  say  that  a  matrix  U  is  within  relative  distance  £  of  a  sur¬ 
face  if  there  is  a  6M  such  that  M+6M  is  on  the  surface  and 
II  $-W|l  E  s  E'll  H ||  g.)  In  fact,  we  will  see  that  the  closer  to  the  surface,  the 
farther  apart  the  bounds.  Furthermore,  we  can  compute  an  asymptotic 
upper  bound  n(n+l)(Ti— l)8  e8  on  the  fraction  of  the  volume  of  the  set  of 
complex  matrices  that  lie  within  relative  distance  e  of  this  surface  (n  is  the 
dimension  of  the  matrix).  This  upper  bound  shows  that  the  volume  of  the  set 
of  points  within  e  of  the  surface  goes  to  zero  as  e  goes  to  zero.  If  we  interpret 
this  fraction  of  the  volume  of  a  set  of  matrices  as  its  probability  then  we  may 
state  our  result  as  follows:  the  probability  that  the  ratio  of  the  upper  bound 
to  the  lower  bound  is  at  most  K>1  is  at  least  l-n(n  +  l)(n-l)8  /P8  +  o(/P8). 
In  other  words,  the  ratio  of  the  bounds  is  large  with  a  low  probability.  This 
same  sort  of  description  applies  to  our  bounds  for  sepx  in  terms  of  sep,  (the 
bound  is  accurate  except  within  a  small  distance  of  a  particular  surface  in 
matrix  space,  and  the  volume  of  points  within  e  of  the  surface  goes  to  zero  as 
a  polynomial  in  e),  to  the  probability  of  being  able  to  completely  diagonalize 
a  given  matrix,  and  to  other  similar  quantities  which  we  will  discuss  in 
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chapter  7. 

These  results  will  follow  from  a  more  general  theorem  which  we  will 
prove  in  this  chapter.  The  general  problem  is  estimating  the  volume  of  points 
within  distance  c  of  certain  surfaces:  homogeneous  varieties.  A  homogene¬ 
ous  variety  is  the  locus  of  solutions  of  a  set  of  simultaneous  polynomial  equa¬ 
tions  which  have  the  property  that  if  (x,,  ....  x*)  lies  on  the  surface,  so  does 

(ax . . axk)  for  any  scalar  a.  Since  we  are  interested  in  relative  distance, 

it  suffices  to  estimate  the  volume  of  points  on  the  sphere  of  matrices  of  Fro- 
benius  norm  1  that  lie  within  distance  e  of  a  homogeneous  variety.  It  is  a 
remarkable  fact  that,  for  complex  matrices,  this  volume  can  be  computed  to 
first  order  in  terms  of  only  two  parameters  of  the  variety:  its  dimension  and 
its  degree.  The  main  result  of  this  chapter  is 

Theorem  6.3:  Let  V  be  a  complex  purely  2n  dimensional  homogeneous 
variety  of  degree  deg(V)  in  C*.  with  n>0.  Then  the  fraction  of  unit  sphere  in 
CP  within  Euclidean  distance  e  of  V  is 

(nil  )  *e«(V>  e^-n)  + 

(  (nil )  i*  binomial  coefficient  (Af-1)!/  ((n— l)*  (Af-n)!)). 

Interpreting  the  fraction  of  area  as  a  probability,  we  may  restate  this  as 

Prob(distj(Af ,  V)  <  e)  =  (£lj  )  deg(F)  e +  o (*«*-•>)  . 
where  U  is  uniformly  distributed  on  the  surface  of  the  unit  sphere. 

Thus,  to  first  order  the  probability  depends  on  only  two  parameters  of 
the  variety:  the  dimension  and  the  degree.  This  simple  result  makes  it  easy 
to  compute  the  probability  in  many  interesting  cases;  we  discuss  singular 
matrices,  defective  matrices,  and  polynomials  with  multiple  roots  in  the  next 
chapter. 
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We  may  extend  this  result  to  real  varieties,  but  we  only  obtain  an  upper 
bound  on  the  probability: 


Theorem  6.6:  Let  V  be  an  n -dimensional  homogeneous  real  variety  of  degree 
deg(y)  in  R^,  where  n>0.  Then  the  fraction  of  the  unit  sphere  in  R*  within 
Euclidean  distance  e  of  V,  (or  equivalently  Prob(dist*  (A/  .  V)  ss  e)  where  ti  is 
uniformly  distributed  on  the  sphere),  is  less  than  or  equal  to 


IX  ) .  rx  g~) 


y=y- 1 


r( 


N- 


^-)  rx^)  r<2-) 


deg(V)  e**1*  +  o(t*-»)  (e.i) 


In  Section  8.2  we  will  define  the  terminology  just  used,  and  state  the 
theorems  from  geometry  and  algebra  we  need  to  prove  our  result. 
Specifically,  we  will  use  Weyl’s  theorem  [Weyl,  Griffiths]  on  volumes  of  spheri¬ 
cal  neighborhoods,  and  Lelong's  theorem  [Lelong,  Thie,  Griffiths]  on  the  area 
of  a  homogeneous  variety.  In  Section  8.3  we  state  and  prove  the  main  result, 
and  show  that  the  probabilistic  interpretation  holds  for  a  large  class  of  pro¬ 
bability  distributions  on  the  set  of  matrices.  In  Section  8.4  we  state  BAzout’s 
theorem  [Kendig]  on  the  degree  of  intersection  of  varieties,  which  we  use  to 
compute  upper  bounds  on  deg(V).  In  Section  6.5  we  extend  our  results  to 
real  varieties  by  using  Crofton’s  formula  [Santalb,  Griffiths]. 

Our  approach  was  motivated  by  a  similar  analysis  of  varieties  in  the 
space  of  polynomials  due  to  Smale  [Smale]. 


6.2  Notation  and  Lemmas  from  Geometry  and  Algebra 

To  prove  our  main  result  we  will  need  several  theorems  from  geometry 
and  algebra.  The  central  result  we  need  from  geometry  is  WeyTs  theorem 
[WeyLGrilfiths]  which  says  that  the  volume  of  a  spherical  neighborhood 
(defined  below)  of  a  manifold  of  radius  e  is  well  approximated  by  a  polynomial 
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in  e  for  small  e.  The  dominating  (lowest  order)  term  of  this  polynomial  con¬ 
tains,  not  surprisingly,  the  area  of  the  manifold  as  a  factor.  Our  central  alge¬ 
braic  result  is  that  the  area  of  that  part  of  a  complex  homogeneous  variety 
within  the  unit  ball  is  equal  to  the  degree  of  the  variety  times  a  constant 
which  depends  only  on  the  dimension  of  the  variety  [Lelong.  Thie,  Griffiths]. 
We  will  present  several  ways  of  computing  the  degree  of  a  variety  in  Section 
6.4. 

Before  discussing  Weyl's  theorem,  we  need  several  definitions  from 
geometry.  (See  [Guillemin  and  Pollack]  for  more  details).  A  subset  M  of 
Euclidean  space  R^  is  called  an  n'dixnensumal  manifold  if  it  is  locally 
home omor phic  to  Iff*.  m=N-n  is  called  the  codimension  of  M  and  is  denoted 
codim(Af).  M  is  called  a  smooth  manifold  if  the  homeomorphism  and  its 
inverse  are  infinitely  differentiable  and  an  analytic  manifold  if  they  are  ana¬ 
lytic. 

By  l-wtlume  of  an  n-dimensional  manifold  M  (Ifcn)  we  mean  the  £- 
dimensional  Lebesgue  measure  of  M,  if  it  exists.  Note  that  if  l  >n  this  volume 
is  zero.  The  notation  vol(Af)  denotes  the  n-volume  of  the  n-dimensional 
manifold  M  ■ 

An  e-tubular  neighborhood  of  U,  denoted  is,  loosely  speaking,  the 

set  of  all  points  of  R^  within  Euclidean  distance  e  of  M.  More  precisely,  it  is 
constructed  as  follows:  for  each  x  elf,  consider  the  space  of  all  vectors  start¬ 
ing  at  z  and  perpendicular  to  if  (the  normal  space  at  z).  The  endpoints  of  all 
such  vectors  starting  at  z  and  of  length  at  most  e  form  a  closed  disk  of 
radius  e  and  center  z  perpendicular  to  U\  the  union  of  all  these  disks  forms 
Moreover,  each  point  in  rt(M)  is  required  to  lie  in  exactly  one  such 
disk.  In  other  words,  it  must  be  possible  to  draw  exactly  one  line  segment 
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from  any  ycrt(Si)  to  Si  that  is  perpendicular  to  Si  and  of  length  at  most  e. 
This  constraint  means  that  some  manifolds  only  allow  tubular  neighborhoods 
for  e  smaller  than  some  bound;  for  example,  a  circle  of  radius  r  only  has 
tubular  neighborhoods  for  e<r,  because  otherwise  the  center  of  the  circle  is 
not  a  distance  rSt  away  from  a  unique  point  on  the  circle.  In  addition,  some 
manifolds  may  not  have  an  e-tubular  neighborhood  at  all  (see  figure  6.1). 

If  SI  lies  entirely  within  the  unit  sphere  S^-1  in  R^.  we  may  analogously 
define  an  e-spherical  neighborhood  of  Si,  denoted  rf(Si),  to  be  the  set  of 
points  of  S*-1  within  spherical  distance  e<rr  of  Si.  That  is,  the  length  of  the 
great  circle  connecting  any  y  €T?(Si)  to  the  nearest  point  x  €ii  is  at  most  e. 
We  construct  rf(M)  as  follows:  construct  the  normal  disks  to  Si  lying  in  R^  as 
above,  but  of  radius  2  sin  e/  2  rather  than  e.  Let  t?(M)  be  the  intersection  of 
the  sphere  S"-1  with  the  union  of  these  disks,  subject  to  the  same  con¬ 
straints  as  before.  Some  simple  trigonometry  shows  that  these  disks  inter¬ 
sect  the  sphere  in  points  at  spherical  distance  at  most  e  from  Si.  Note  that  e 
must  be  less  than  n  for  an  e-spherical  neighborhood  to  exist,  since  there  are 
two  geodesics  of  length  it  perpendicular  to  Si  connecting  any  xzSi  to  its 
antipodal  point. 

Now  we  can  state  Weyl’s  theorem  for  the  volume  of  e-spherical  neighbor¬ 
hoods.  Let 


1)/8 


(6.2) 


denote  the  surface  area  of  the  unit  sphere  ST*  in  fP"+1  (Le.  the  m -volume), 
and 


2nm/8 

^  *  mltm/2)  =  “w*-l/TTl 
denote  the  volume  of  the  unit  ball  in  Rf  [Santald]. 


(6.3) 
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Theorem  6.1  (Teyi):  Let  if  be  a  smooth  n -dimensional  manifold  in  S^-1,  and 
rf{M)  an  e -spherical  neighborhood  of  U.  Let  m-N —\—n  be  the  codimension 
of  U  (as  a  subset  of  Then 


vol(r/(Af))  =  Um-i  ■  E  . 

QM<n 
•  mvmx 


(6.4) 


where  the  A;, (if)  are  the  integrals  of  certain  differential  forms  over  M,  k0(M) 
is  the  volume  of  M,  and 


Ua'rt  fa 

r(  s-  1  (8-5) 

*'e'  ”  m(m.+  2)  •  •  •  (m+e— 2) 

(if  e  =0  the  denominator  in  J,  (e)  is  1). 

first  we  will  discuss  the  behavior  of  J,  (e)  as  a  function  of  e  and  e,  and 
then  we  will  discuss  the  geometric  meaning  of  the  theorem. 

It  is  possible  to  evaluate  the  integral  defining  J,(t)  exactly,  but  since 
later  approximations  render  the  higher  order  terms  in  e  useless,  we  only 
consider  its  behavior  for  small  e.  Thus,  we  need  only  examine  the  behavior  of 
the  integrand  for  small  r  as  well,  for  which  it  obviously  equals 


so  that 


T*  *m-i 
(l+r2)*/Z  = 


/  (r**"*"1  +  0(r,*m*1))  dr 
m(m  + 2)  •  •  •  (m+e -2) 


taij"'t'm 

m(m+2)  ■  •  • 


4— 7-r+  0(tan*+"+*e) 
(m+e) 


(the  last  equality  follows  since  tan  e  =  e+0(e3)). 
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Combining  this  last  estimate  with  Veyl's  theorem  yields: 

vol(T/(iO)  =  ^^-vo l(Jf)  cm  +  0(em* z)  (8-6) 

m 

=  Q"  vol(if)  tm  +  0(em+8)  . 

What  is  the  geometric  meaning  of  this  last  expression?  Take,  for  exam¬ 
ple,  N- 3  and  n=0;  this  corresponds  to  U  being  a  set  of  k  distinct  points  on 
the  unit  sphere  SP  in  BP.  vo1(t/(A0)  Is  then  just  the  area  of  k  spherical  caps 
centered  at  the  points  of  U  and  each  of  radius  e.  For  small  e,  this  area  is 
approximately  kite2,  which  is  what  (6.6)  gives  after  plugging  in 
m=N — 1-n  =2,  vol(Af)=/fc .  and  Zl^  =n.  Similarly,  we  may  take  N-3  and  n=l 
corresponding  to  a  one-dimensional  curve  U  of  length  l,  say,  on 
vol(r/(Af))  is  then  just  the  area  of  a  strip  surrounding  M  of  length  l  and 
width  2e,  which  is  clearly  approximated  by  21  t.  Plugging  m=l,  0^=2,  and 
vol into  (8.6)  yields  the  same  expression.  In  these  simple  examples 
Weyl's  theorem  tells  us  the  intuitive  fact  that  the  volume  of  the  spherical 
neighborhood  is  approximately  equal  to  the  volume  of  an  m-ball  of  radius  r 
(Qro  e")  times  the  volume  of  if;  Weyl's  theorem  extends  the  intuition  of 
these  small  examples  to  higher  dimensions. 

There  is  also  a  version  of  Weyl's  theorem  for  tubular  neighborhoods 
[Weyl,  Griffiths].  It  says  that  the  volume  of  L<  a  polynomial  in  t  of 

degree  at  most  N  (where  if  cH^),  and  that  the  lowest  order  (dominating) 
term  in  e  is  vol(Af)  eM“n  as  expected.  We  will  not  use  this  version  of 
Weyl's  theorem. 

Next  we  discuss  Lelong's  theorem  on  the  volume  of  a  homogeneous  pure 
dimensional  complex  algebraic  variety.  First  we  need  several  definitions  from 
algebraic  geometry.  (See  [Kendig]  for  more  details).  A  variety  V  is  the  zero 
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set  of  a  collection  \pa{.t\ . a^)}:  of  polynomials: 


y~K*  1 . *n)  I  P«(*l . 2n)=0  for  all  a$  . 

V  is  called  is  called  real  or  complex  according  to  whether  the  z*  are  real  or 

complex.  Since  V  can  in  general  have  points  of  self  intersection,  it  is  gen¬ 
erally  not  a  manifold,  since  it  is  not  homeomorphic  to  Euclidean  space  In  the 
neighborhood  of  an  intersection  point.  However,  points  q  with  relatively 
open  neighborhoods  UfcV  that  are  analytic  manifolds  are  dense  in  V 
[Theorem  4.2.4,  Kendig]  so  that  the  following  definition  makes  sense:  the 
dimension  of  Vat  p.  written  dim^H),  is 

dimp(H)  s  lim  sup  dim(C/g)  . 

!/f  a  manifold 

We  define  in  turn  the  dimension  of  V  as  the  maximum  over  all  peV  of 
dimj,(V0.  V  is  called  pure  dimensional  or  purely  n-dimensional  if  dim,,(  V)=n 
for  all  p  e  V.  When  we  refer  to  the  dimension  of  anything  in  this  thesis,  we  will 
always  mean  the  real  dimension,  i.e.  the  the  dimension  as  a  real  manifold  or 
variety,  rather  than  the  complex  dimension  often  used  for  complex  objects, 
which  is  exactly  half  the  real  dimension.  To  emphasize  that  we  are  dealing 
with  a  complex  variety,  we  will  write  its  real  dimension  as  2 n. 

We  call  V  homogeneous  if  it  is  a  cone;  that  is  if  (z,,  .  .  ,*B)eV  implies 

(a*j . azn)€H  for  all  scalars  a  (real  scalars  if  V  is  a  real  variety,  complex 

if  V  is  complex).  In  terms  of  the  defining  polynomials  [p0|  this  means  that  if 


P.(*  . . *n)  =  £*** *  •  •  •  xz^7  . 

then 

*■1 

where  (i  does  not  depend  on  j.  d  is  called  the  order  of  pm,  and  written 
order(p).  We  say  order  instead  of  degree  because  we  use  degree  to  for  the 
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more  general  concept  in  the  next  paragraph. 

We  define  the  degree  of  a  purely  2n -dimensional  homogeneous  complex 
variety  V  in  as  follows.  Let  be  a  2N-2n  dimensional  linear  mani¬ 

fold  (plane)  in  O^.  Since  dim(Xaw_an)  +  dim(V)  =  dim(Cw)  =  2N ,  we  say  that 
ajjd  y  are  Df  complementary  dimension.  Generically,  and  V 

will  intersect  in  a  surface  of  codimension  equal  to  the  sum  of  their  codimen¬ 
sions,  that  is  2 JV.  In  other  words,  their  intersection  will  be  of  dimension  0  (a 
finite  collection  of  points)  for  almost  all  planes  Law-8n.  It  turns  out  that  for 
almost  all  L2N~2n.  this  collection  will  contain  the  same  number  of  points,  and 
this  common  number  is  called  the  degree  of  V,  and  is  written  deg(V).  (see 
[theorem  4.6.2,  Kendig]).  Intuitively,  deg(V)  gives  the  number  of  "leaves"  of 
the  variety  V  that  a  typical  plane  lZN~2n  will  intersect. 

Now  we  can  state  Lelong's  theorem  [Lelong.  Thie.  Griffiths]  (or  more  pre¬ 
cisely  just  the  special  case  we  need): 

Theorem  6.2  (Lelong;):  Let  V  be  a  purely  2n -dimensional  homogeneous  com¬ 
plex  variety  in  Cw,  where  n>0.  Let  V[r]  denote  that  part  of  V  contained  in 
Bfi(r)  (the  IV -ball  of  radius  r  centered  at  the  origin).  Then  the  volume  of 
V[r  ]  is  given  by 

voUVlrDsfla,  degOO  -r2"  .  (8.7) 

This  remarkable  theorem  says  the  following:  the  volume  of 
V[r ]  =  VC\Bti(r)  is  identical  to  the  volume  of  By(r)  intersected  with  the 
variety  consisting  simply  of  deg(V)  planes  of  dimension  2n  passing  through 
the  origin.  This  theorem  makes  the  computation  of  the  leading  term  in  the 
expression  for  volume  in  Weyl's  theorem  simple,  given  the  ability  to  compute 
deg(V).  The  preparation  for  proving  the  main  result  in  the  next  section  is 
now  complete. 
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6.3  The  Volume  of  a  Spherical  Neighborhood  of  a  Complex  Homogeneous 
Variety 

The  main  result  of  this  chapter  is  the  following  theorem.  After  the  proof, 
which  is  quite  easy  given  the  preparation  of  the  last  section,  we  will  discuss 
it. 

Theorem  8.3:  Let  V  be  a  complex  purely  2n  dimensional  homogeneous 
variety  of  degree  deg(  V)  in  CN,  where  n>0.  Then  the  fraction  of  unit  sphere 
in  within  Euclidean  distance  c  of  V  is 

( n  -1  )  deg(  V)  e*"  +  o  (e8"1 )  .  (8.8.a) 

where  2m=2N  -2n  is  the  codimension  of  V. 

Interpreting  the  fraction  of  area  as  a  probability,  we  may  restate  this  as 

Prob(dist*(Af.V)  it)  =  )  deg(F)  c®"  +  o(ez"*)  .  (6.B.b) 

where  M  is  uniformly  distributed  on  the  surface  of  the  unit  sphere. 

Proof-  The  surface  to  which  we  would  like  to  be  able  to  apply  Weyl’s  theorem 
is  V  *  the  intersection  of  V  and  the  unit  sphere  in  C*.  From  Fig¬ 
ure  8.2  we  see  that  a  point  is  within  Euclidean  distance  e  of  V  if  and 

only  if  it  is  within  spherical  distance  arcsin  e=e  +  0(e3)  of  V.  Thus,  the 
spherical  neighborhood  whose  volume  we  would  like  to  measure  (and  divide 
by  voliS**1'1)  to  get  the  fraction  of  5W,~1)  is  t/L^, ,( V").  Lelong's  theorem 
tells  us  vol(V[l])  in  terms  of  deg(V);  it  will  turn  out  that 
vol(K)  =  2n  •  vol(V[l]).  Thus,  if  t(F)  existed,  we  could  use  equations 
(6.1)  through  (8.4)  to  compute 


voKr^B«int(F))  ■  vol(F)  (arcsin  c)8w 

vol(5^-‘)  “  Wbjv-i 


0((arcsin  e)®**8) 


=  ■  2n  Den  deg(V)  +  0(e8"»+8) 
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=  (jjnj  )  •  deg(F)  •  t*"  +  0(e2m,*z)  (6.9) 

and  be  done.  Unfortunately,  does  not  exist  for  reasons  we  will  now 

discuss.  Even  so,  we  will  show  that  the  result  of  the  above  computation  is 
valid  provided  we  replace  0(e2m+z  1  by  0(6®"). 

Since  V  is  a  variety,  it  will  in  general  intersect  itself  and  not  be  a  mani¬ 
fold.  If  we  remove  the  set  V,  of  these  intersection  points  from  V  the 
remainder  Vm  *  V  -  Vt  is  a  manifold  [Griffiths].  Furthermore, 
dini(lk»)  =  dim(V),  and  ^  is  a  connected,  open  and  dense  subset  of  V,  so  we 
lose  nothing  by  considering  instead  of  V  [Kendig].  Since  any  homogene¬ 
ous  set  like  V*.  intersects  the  sphere  transversally,  the  intersection 

is  also  a  manifold  of  dimension  one  less  than  Vm.  Unfor¬ 
tunately,  V  also  will  not  generally  possess  spherical  neighborhoods  because 
it  can  contain  "pinched  sections"  as  illustrated  in  figure  6.1.  However,  if  we 
remove  the  open  set  of  all  points  within  Euclidean  distance  77  of  VJ  from  V^', 
what  remains  will  be  a  compact  set  ^(tj)  which  does  have  an  t  neighborhood 
for  t  less  than  a  threshold  l(r})  which  may  go  to  zero  as  77  does.  As  77 
approaches  zero,  the  volume  of  Vm  (77)  approaches  the  volume  of  V*, ,  and  the 
ratio  of  the  volume  of  all  points  within  e  of  Vm  -  1^(77)  to  the  volume  of  all 
points  within  e  of  Vm(r))  goes  to  zero.  Thus,  the  estimate  in  (8.9)  remains 
valid  if  we  replace  by  o(zan). 

It  remains  to  show  that  vol(V)  =  2n  vol(V[l]).  We  show  more  generally 
that  if  V  is  the  union  of  d-dimensional  cones,  then  vol(V)  =  dvol(V[l]).  This 
follows  from  expressing  vol(  V[l])  as  the  integral  in  spherical  coordinates  of 
the  volumes  of  concentric  spherical  sections  of  V 
\ 

▼ol(V[l])  =  f  p*'1  vol(V)  dp 
0 
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=  vol(  V)/  d.  . 

This  completes  the  proof.  Q.E.D. 

Our  first  remark  concerns  the  likely  size  of  the  error  term  a  (eZm). 
Since  we  are  adding  volumes  of  the  form 

const  •  e*m  +  0(e8m+2)  , 

it  is  clear  that  if  the  first  term  in  (6.9)  is  an  underestimate,  it  is  still  an 
underestimate  by  at  most  0{etm¥Z).  Thus  o(tZm)  represents  an  error 
bounded  above  by  0(eZm'l'a).  More  troublesome  is  the  possibility  that  the  first 
term  of  (8.9)  is  an  overestimate.  Consider  the  V  of  Figure  8.3,  which  is  the 
union  of  parts  of  the  plane  curves  y  =0  and  y  =xa*  (K  is  clearly  not  a  homo¬ 
geneous  complex  variety,  but  it  illustrates  our  point).  If  we  add  the  areas  of 
the  e-tubular  neighborhoods  of  these  two  parts,  the  sum  overestimates  the 
area  we  want  to  measure  by  the  area  of  the  shaded  region,  which  is  doubly 
covered  by  the  two  e-neighborhoods.  The  area  of  this  doubly  covered  region 
is  approximately 

QU)l/pn) 

f  (2 e-*8")  dx  =  0(e1  *  1/<8n>)  . 

-(*,)!/ (»0 

The  area  we  want  is  clearly  dominated  by  a  linear  term  in  e.  so  we  see  that 
the  overestimate  depends  on  2 n,  the  degree  of  V. 

Our  second  remark  concerns  the  probabilistic  interpretation  of  (6.7.b). 
Instead  at  choosing  a  random  point  U  uniformly  distributed  on  Sw~',  we 
consider  choosing  a  random  point  U  according  to  the  density  p  and  ask 
about  the  distribution  Prob(dist*(Af/ H  U i|  g  ,  V)  ss  e),  where  ||Jf||*  is  the 
Euclidean  norm  of  U  (Frobenius  norm  if  Ji  is  a  matrix)  so  that  fi/  ||  M\  \  s 
must  lie  on  As  long  as  the  random  variable  Si/  ||  U  j|  g  is  unformly  dis¬ 

tributed  on  S**-1,  Prob(dist*(Af/ ||  U  ||  g  ,V)*e)  will  still  be  given  by  the 
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expression  in  (8.7.b).  Thus,  we  may  apply  our  result  to  any  density  p  for  M 
which  causes  U/  ||  M ||  g  to  be  uniform  on  S2*-1.  A  large  class  of  such  densi¬ 
ties  p  is  simply  characterized  by  the  symmetry  condition  p (JH)  =  /  (!|  M ||  g). 
i.e.  that  the  density  function  p  is  really  only  a  function  of  ||  A/||  Two  well 
known  such  densities  are 

tea  if  it  atiix  <  i 

V\M)  =  |  0  otherwise 

the  uniform  density  on  the  interior  of  the  unit  ball,  and 

p(lH)  =  (2 n)l/N  ■  e~l|,,,,|/z  . 

the  normal  density  on  C*  (i.e.  each  component  of  U  is  an  independent  Gaus¬ 
sian  random  variable  with  mean  0  and  variance  1). 

8.4  Estimating  deg(  V)  of  a  Complex  Homogeneous  Variety 

Next  we  turn  to  the  problem  of  computing  deg(V).  There  are  two  tools 
from  algebra  we  will  use.  Both  are  standard  results  in  algebraic  geometry 
and  can  be  found  in  [Chap  4,  Kendig]. 

Theorem  8.4:  If  the  complex  homogeneous  variety  V  is  defined  as  the  zero 
set  of  the  single  homogeneous  polynomial  p,  then  codim(  V)=2,  and  V  is 
called  a  hypersvrface.  If  in  addition  p  is  the  product  of  distinct  irreducible 
factors,  then  deg(V)=order(p). 

Since  several  interesting  varieties  we  encounter  later  are  defined  by  a 
single  irreducible  polynomial  whose  order  we  know,  this  theorem  supplies  all 
data  needed  to  compute  the  volumes  of  their  spherical  neighborhoods  to  first 
order. 

Our  second  tool  is  a  slightly  nonstandard  version  of  a  well  known 


theorem: 
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Theorem  6.5  (Bteout):  Let  V  be  a  complex  homogeneous  variety  given  as  the 
zero  set  of  the  finite  collection  of  homogeneous  polynomials  {<»!.«.  Then 
we  can  bound  deg(  V)  as  follows: 

1  £  deg(V)  £  ft  orderfo)  .  (6.10) 

<«t 

The  standard  version  of  this  theorem  says  that  if  varieties  Vit  each 
defined  by  the  single  polynomial  pi ,  intersect  transversally,  that  is,  if 

codim(n  K)  =  2  codim(l{)  , 

t»i  i*i 

then  deg(V)  actually  equals  the  product  in  (6.10).  It  is  no  surprise  that  this 
product  provides  an  upper  bound  when  the  do  not  intersect  transversally. 
Unfortunately,  it  seems  to  give  an  atrocious  upper  bound  on  some  occasions, 
but  it  is  simple  to  compute. 

6.5  Real  Varieties 

To  extend  the  results  of  previous  sections  to  real  varieties,  we  need  to 
estimate  the  volume  of  a  real  variety.  The  difficulties  in  doing  this  estimate 
are  illustrated  by  the  following  example.  Consider  the  variety  V(a,6,c) 
defined  by  the  polynomial  jj=ox*  +  by 2  +  czz,  where  neither  a  nor  6  nor  c  is 
zero.  If  x,  y,  and  z  were  complex,  theorem  8.4  would  imply  that  V  had  codi* 
me  ns  ion  2  and  degree  2,  and  so  by  Lelong's  theorem  vol(V[l])  would  be 
2 O^rr2.  independent  of  a,  6 ,  and  c  (as  long  as  they  are  nonzero).  For  x ,  y, 
and  z  real,  we  have  the  following  possibilities,  among  others:  If  a  =6  =  1  and 
c  <0,  then  V  is  a  circular  cone  with  codimension  1  and  area  2rrVc/(c-i), 
which  approaches  0  when  c  does,  and  approaches  2rr  as  c  goes  to  — <*>.  If  a,  b 
and  c  all  have  the  same  sign.  V  degenerates  to  a  single  point  at  the  origin. 

Despite  these  problems,  it  turns  out  we  can  still  derive  an  upper  bound 
on  the  volume  of  a  real  variety  V  given  only  its  dimension  and  degree,  where 
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by  degree  we  mean  the  largest  finite  number  of  intersection  points  of  V  and 
almost  all  planes  L  of  complementary  dimension.  This  bound  uses  Crofton's 
formula  [Santald].  We  can  use  this  bound  in  turn  to  extend  Theorem  6.3  to 
homogeneous  real  varieties  as  follows: 

Theorem  6.6:  Let  V  be  an  n-dimensional  homogeneous  real  variety  of  degree 
deg(V)  in  R*.  where  n>0.  Then  the  fraction  of  the  unit  sphere  in  within 
Euclidean  distance  e  of  V  is  less  than  or  equal  to 


n  ~lL>  •  r(  £-) 


■V-w-i 


IX  ■  r(  zp-)  ■  rx  f) 


deg(V)  tm  +  oic*)  (6.11) 


where  m  =  JV  -n  is  the  codimension  of  V. 


Since  this  theorem  includes  pure  dimensional  homogeneous  complex 
varieties  as  a  special  case,  it  provides  an  upper  bound  to  the  result  in 
theorem  6.3: 


Corollary  6.7:  Let  V  be  a  purely  Sn-dimensional  homogeneous  complex 
variety  of  degree  deg(V)  in  CN,  where  n> 0.  Then  the  fraction  of  the  unit 
sphere  in  CN  within  Euclidean  distance  e  of  V  is  less  than  or  equal  to 


(£:} )  •  deg(V)  +  o(e*»*)  =  [fjj]  jf-  deg(V)  s**  +  o 


Proof:  Convert  the  gamma  functions  in  Theorem  6.6  to  factorials. 

Thus,  this  estimate  is  too  big  by  the  factor  [|^]  /  [£[  ]  for  complex 
homogeneous  varieties.  We  will  see  why  this  factor  appears  later  from 
Crofton's  formula. 


Proof  of  Theorem  6.6:  Just  as  Le long's  theorem  provides  an  estimate  of 
vol(V[r])  =  vol( V(~)Bn(t))  for  a  complex  homogeneous  variety  V,  Crofton's 
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formula  lets  us  estimate  vol(V[r])  for  real  varieties.  Given  this  estimate 
(Lemma  8.B),  the  rest  of  the  proof  is  identical  to  that  of  Theorem  6.3. 


Actually,  Lemma  6.8  applies  to  much  more  general  objects  than 
varieties  (note  that  the  definition  of  deg(V)  makes  sense  for  V  a  union  of 
manifolds): 


Lemma  6.8:  Let  V  be  a  countable  union  of  manifolds  in  of  dimension  n  or 
lower  such  that  deg(V)  is  finite.  Let  l/[r]=  Vf]Bn(r)  be  that  part  of  V  within 
the  ball  of  radius  r.  Then 


vol(V[r])* 


r 

N+l 

2 

■  Vrr 

r 

JV-n+1 

•  r 

n  +  l 

2 

2 

0*  deg(V)  rn 


(8.12) 


Since  this  theorem  includes  complex  pure  dimensional  homogeneous 
varieties  as  a  special  case,  it  provides  an  upper  bound  on  the  value  of 
vol(  V[r])  given  by  Lelong's  theorem: 

Corollary  6.9:  Let  V  be  2 n  dimensional  in  CN,  and  otherwise  as  described  in 
Theorem  6.6.  Then 


vol(Vt>])ss 


deg(V)  ■ 


Proof:  Convert  the  gamma  functions  in  Theorem  6.6  to  factorials. 


(6.13) 


Thus,  this  estimate  is  too  large  by  the  factor  [|^]  /  |  ^  j  for  complex 
homogeneous  pure  dimensional  varieties,  which  is  the  source  of  the  overesti¬ 
mate  in  Corollary  6.7.  Nevertheless,  when  V=SN~l  in  R*  (so  deg(  V)=2)  and 
r=l,  the  expression  for  vol(V[l])  given  by  Theorem  8.8  is  exact,  so  we  see  we 
have  traded  generality  of  the  hypotheses  (unions  of  manifolds  instead  of 
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homogeneous  varieties)  for  tightness  of  the  bound. 

Proof  of  Lemma  6.B:  The  proof  follows  easily  from  several  results  of  integral 
geometry,  all  of  which  can  be  found  in  [Santald].  We  assume  without  loss  of 
generality  that  V  is  a  manifold;  the  general  result  follows  by  applying  the  fol¬ 
lowing  analysis  to  each  constituent  manifold  of  V  and  adding  the  bounds. 


Crofton's  formula  [  equation  14.70,  Santald]  expresses  the  volume  of  an 
n -manifold  V  in  in  terms  of  an  integral: 


vol(  V)  = 


n 

i*l 


-f  . 


(6.14) 


The  integral  is  over  all  N— n  dimensional  planes  where  dLN^n  is  the 

kinematic  density  for  N-n  planes.  This  means  that  the  measure  of  a  set  of 
planes  is  invariant  under  the  group  of  rigid  motions  in  R^.  §  (LN~*  D  V)  is  the 
number  of  points  in  the  intersection  of  Lfi^n  and  V;  by  hypothesis  this  is  a 
nonnegative  integer  bounded  above  by  deg(V)  for  almost  all  LN~n.  Thus 


II  “i 

vol< V)  <  deg( V)  ■=TTr!L—  /  dX*-* 


(6.15) 


Applying  this  to  V[r]  instead  of  V,  and  noting  that  LN_n  can  intersect  V[r] 
only  if  it  intersects  Bn(r),  we  see  that 


n  «» 

imlJI-n 


(6.16) 


vol(V)  *  deg(V)  ‘"Vy2-  /  dL"-*.  (6.16) 

iJthjt*  *—r**)* 

The  integral  in  the  last  equation  is  known  as  a  "cross-sectional  integral", 
(quermassintegral:  [Chap  13.6,  Santald])  because  it  gives  the  measure  of  the 
set  of  planes  which  slice  From  equations  (14.1)  and  (13.46)  of  Santald, 
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n  n  «* 

dL*-"  =  - —  -  •  (6.17) 

n  n  ui 

iwojl-n-l 

Substituting  this  in  the  last  inequality  yields 

vol( V)  <  deg( V)  r*  —  (6. IB) 

It  CJ 0 

which  after  some  manipulation  using  equation  B.l  yields  the  bound  of  the 
lemma.  Q.E.D.  of  Lemma  S.S  and  Theorem  8.6. 

We  turn  now  to  estimating  the  dimension  and  degree  of  a  homogeneous 
real  variety  Kg.  We  assume  Kg  is  given  as  the  locus  of  zeros  of  a  set  (p0j  of 
homogeneous  polynomials  in  the  real  variables  {x*  j.  We  may  assume  without 
loss  of  generality  that  each  pm  has  real  coefficients.  By  allowing  the  to  be 
complex,  ipai  naturally  determines  a  complex  homogeneous  variety  Kc,  and 
it  is  natural  to  ask  about  the  relationship  between  the  degree  and  dimension 
of  Kc  and  the  degree  and  dimension  of  Kg. 

Theorem  6.10:  Let  Kc  and  Kg  both  be  determined  by  |pa{  as  described  above. 
Then 

dim(Kg)  g  (6.19) 

and 


deg(Kg)  ac  deg(K|g)  .  (6.20) 

Proof:  The  relationship  between  dimensions  follows  from  the  implicit  function 
theorem,  which  says  that  if  a  point  p  e  Vq  has  a  neighborhood  U  which  is  a 
manifold  of  dimension  2n,  then  there  is  an  ordering  of  the  coordinates 
X] . Zff  such  that  near  p  Vc  can  be  parameterized  as 


where 


(^i . .  ■  •  ' 
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*=«»<(*  1 . *n) 

By  restricting  the  to  be  real  (but  still  within  the  region  of  definition),  we 


see  that  VB  can  have  dimension  at  most  n. 

Conversely,  such  a  local  parameterization  of  a  real  manifold  can  define 
a  complex  manifold  locally  of  twice  the  dimension  if  the  functions  are 
defined  for  complex  arguments.  In  particular,  a  real  plane  L extends  to  a 
complex  plane  L&f~2n.  If  O  contains  at  most  deg(V’c)  points,  then 

n  V*cZr  n  Vg  can  also  contain  at  most  deg(  Vg)  points.  Q.E.D. 

All  of  the  real  and  complex  varieties  we  consider  can  be  given  as  the  the 
locus  of  zeros  of  {paj  where  each  pa  has  real  coefficients,  so  we  can  use  this 
theorem  to  extend  our  knowledge  about  the  degree  and  dimension  of  com¬ 
plex  varieties  (see  section  6.4)  to  real  varieties.  Indeed,  for  all  the  varieties 
we  study,  the  dimension  of  VBwill  be  exactly  half  the  dimension  of  V© 


Figure  6.3  Counting 
Shaded  I 
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Chapter  7:  Applications  to  Matrix  Inversion,  Eigenvalue  Problems,  and  Poly¬ 
nomial  Zero  Finding 

7.1  Introduction 

In  the  last  chapter  we  proved  two  theorems  about  the  fraction  of  the 
unit  sphere  within  distance  e  of  a  homogeneous  variety.  In  this  chapter  we 
apply  this  result  to  three  problems  of  numerical  analysis.  In  section  7.2  we 
compute  the  fraction  of  n  by  n  matrices  within  s  of  a  matrix  of  rank  at  most 
r.  When  r  =n— 1  this  result  can  be  interpreted  as  the  probability  distribution 
of  the  condition  number  of  a  random  matrix.  In  section  7.3  we  compute  the 
fraction  of  matrices  within  e  of  a  matrix  with  a  given  Jordan  canonical  form. 
In  the  simplest  case,  when  the  Jordan  form  contains  (at  least)  one  double 
eigenvalue,  this  result  gives  the  probability  distribution  of  the  distance  from 
a  random  matrix  to  the  nearest  defective  matrix  In  section  7.4  we  compute 
the  fraction  of  polynomials  within  t  of  a  polynomial  with  a  given  zero  struc¬ 
ture.  For  the  zero  structure  containing  (at  least)  one  double  zero,  this  result 
gives  the  probability  distribution  of  the  distance  from  a  random  polynomial 
to  one  with  a  double  root.  Section  7.5  contains  the  proofs  of  two  algebraic 
lemmas  needed  earlier. 

7.2  The  Distribution  of  the  Distance  from  a  Random  Matrix  to  a  Matrix  of 
Rankr 

In  this  section  we  will  apply  the  general  results  of  the  last  chapter  to  the 
varieties  of  n  by  n  matrices  containing  those  of  rank  at  most  r.  When  r=n-l 
we  are  talking  about  the  variety  of  singular  matrices.  We  adopt  the  proba¬ 
bilistic  interpretation  of  the  results  of  the  last  chapter  and  ask  the  following 
question:  if  a  matrix  if  is  chosen  at  random  so  that  if  / 1|  if  ||  g  is  uniformly 
d  stributed  on  the  unit  sphere,  what  is  the  probability  distribution  of  the  dis- 
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tance  of  Jli/  ||  M\\  g  to  the  set  of  matrices  of  rank  r?  Said  another  way,  what 
is  the  distribution  of  the  relative  distance  from  M  to  the  matrices  of  rank  r? 
To  describe  our  results,  we  let  denote  the  complex  n  by  n  matrices  of 
rank  at  most  r,  and  Vk  denote  the  real  matrices  of  rank  at  most  r.  When 
r=n — 1,  so  that  Vg-1  (Vft-1)  is  the  set  of  singular  matrices,  we  cam  state  our 
result  as  follows: 

Theorem  7.1:  Let  Me  be  a  complex  n  by  n  matrix  chosen  randomly  as 
described  above.  Then 

Prob(dist/>(.tf c  /  II  Afc||  s  •  Vfc-1)  =  n(n2-l)  ■  ez  +  o(e2)  .  (7.1) 

If  Mji  is  a  random  real  matrix,  then 

Prob(dist*(A/«/  ||  Afall  s  .  V#"1 )  s  •  e  +  o  (e)  .  (7.2) 

We  can  interpet  this  theorem  in  a  fashion  more  common  among  numeri¬ 
cal  analysts.  We  define  the  condition  number  of  a  (real  or  complex)  matrix  as 

tfOO-  11*11*- 1!  Af-MI  • 

As  is  well  known  [Eckart],  ||  Af-1||  _l  is  the  Euclidean  distance  distf  from  M  to 
the  nearest  singular  matrix.  The  condition  number  is  used  by  numerical 
analysts  to  measure  the  difficulty  of  inverting  a  matrix,  because  it  gives  the 
maximum  relative  perturbation  that  can  be  caused  in  M~l  by  a  unit  relative 
perturbation  in  M.  In  this  notation,  Theorem  7.1  can  be  restated  as 

Corollary  7.2:  If  Me  a  random  complex  matrix,  then 

ProhOc'(tfc)  *  K)  =  n  (n*-l)  •  K*  +  o  (K~z)  (7.3) 

If  M b  is  a  random  real  matrix,  then 

ProbOc'(Afa)  i/f)<  OluZllL.  +  o(JT>)  .  (7.4) 

Since  the  condition  number  is  commonly  used  as  a  measure  of  how 
difficult  a  matrix  is  to  invert  accurately,  Corollary  7.2  measures  the  likeli- 
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hood  of  a  random  matrix  being  hard  to  invert. 

Note  that  this  theorem  provides  no  information  at  all  unless 
n(n*-l)  •  1.  i.e.  A^v'n(nfe-1)  in  the  complex  case,  and  unless 

Jfen(nz- 1)/2  in  the  real  case.  For  real  10  by  10  matrices,  this  requires  K  to 
be  at  least  495,  already  rather  large  by  the  standards  of  some  numerical 
analysts.  Until  sharper  versions  of  Theorem  6.3  and  8.6  are  forthcoming, 
applying  these  asymptotic  formulas  in  practical  circumstances  must  be  done 
carefully. 

For  matrices  of  rank  r  <n-l  our  results  are: 

Theorem  7.3:  If  Afc  is  a  random  complex  matrix,  then 

<  2  ■,  ) 

Prob(dist*(Afc/  II  Aid!  e  •  Vfc)  =  •  deg(V£)  s^{"~r,,  +  °(e2(B'r)*)  (7.5) 

where 

1  <  deg(  V£)  <  (r  +  l)^"1  ^  .  (7  B) 

If  AfE  is  a  random  real  matrix,  then 

Prob(dist*(AfE/  ||  Af.ll  g  .  VI)  s:  (7.7) 

r(  !i~-)  •  r(  ^-) 

ff((»~r)*~i)/z .  F(  IzEEHL)  .  r(  ^ET) .  ix  ?^r+.rL) * 

2  2  2 

deg(Vfc)  e^'r>'  +  o(e<" -'’•>•) 
where  1  s  deg(V^)  «  deg(V£). 

The  important  aspect  of  these  formulas  is  not  the  constant  coefficient, 
but  the  exponent  of  e.  since  this  exponent  describes  the  behavior  of  the  pro¬ 
bability  as  a  function  of  e.  We  see  that  matrices  of  lower  rank  become  less 
common  rather  quickly  the  exponent  2(n-r)8  (or  (n  -r  )*)  going  up  as  the 


square  of  the  rank  deficiency  n— r. 

We  can  use  Theorems  7.1  and  7.3  to  compute  various  conditional  proba¬ 
bilities.  For  example,  one  might  ask  about  the  probability  of  a  random 
matrix  being  within  relative  distance  e  of  a  matrix  of  rank  r  given  that  it  is 
within  e  of  a  matrix  of  rank  r  +  1.  We  compute  this  simply  by  dividing  the  dis¬ 
tribution  of  the  distance  to  V  by  the  distribution  of  the  distance  to  V+1  to 
get  (in  the  complex  case)  a  constant  times  E4(n-,)~2.  This  quantity  tells  us 
that  surfaces  of  lower  rank  matrices  become  increasingly  sparser  within  the 
surface  of  matrices  of  rank  one  higher.  For  example,  the  density  of  singular 
(rank  n-1)  matrices  within  all  matrices  behaves  as  e2,  rank  n— 2  matrices 
within  rank  n  —1  matrices  as  e®  and  so  on. 

Proof  of  Theorem  7.1:  The  proof  for  Vg-1  will  follow  immediately  from 
Theorem  6.3  if  we  show  that  Vg-1  is  a  purely  2nz -2-dimensional  complex 
homogeneous  variety  of  degree  n.  This  in  turn  follows  directly  from  Theorem 
6.4  since  vg->  is  the  zero  set  of  the  single  irreducible  order  n  polynomial 
det(jtf).  Similarly,  the  result  for  VjJ  will  follow  from  Theorem  6.6  if  we  show 
that  is  an  n2— 1  dimensional  real  variety  of  degree  at  most  n.  The  degree 
bound  follows  from  Theorem  8.10  and  the  dimension  from  noting  that 
det(Af)=0  is  linear  in  each  so  that  mV  can  easily  be  expressed  as  a 
rational  function  in  the  other  n2-l  real  variables.  Q.E.D. 

Proof  of  Theorem  7.3:  As  with  the  last  proof,  the  results  follow  from  Theorems 
6.3  and  8.6  given  the  dimension  and  bounds  on  the  degrees  of  Vg  and  Vg.  To 
compute  the  dimension,  we  use  Gaussian  elimination  to  put  the  matrix  in  row 
echelon  form:  if  M  has  rank  r,  then  its  rows  and  columns  can  be  permuted  so 
that  the  permuted  Ji'  can  be  factored  as  LU.  L  is  lower  triangular  with  ones 
on  the  diagonal  and  nonzeroes  below  the  diagonal  only  in  columns  1  through 
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r.  and  £7  is  upper  triangular  with  nonzeroes  only  in  rows  1  through  r.  These 
nonzero  entries  serve  as  local  coordinates  for  V.  and  there  are  2m— r2  of 
them,  so  that  dim(yfc)=2m— r2,  and  dim(V£)=2dim(Vfc).  since  the  nonzero 
entries  are  all  complex  in  that  case.  To  compute  deg(V£),  we  note  that  Pg  is 
given  by  the  collection  {paj  of  all  determinants  of  order  r  +  1  minors  of  M,  of 
which  there  are  (r+i)  ■  The  bound  on  deg(Vfc)  comes  from  Bfezout’s 
Theorem,  and  the  bound  on  deg(V|t)  from  Theorem  6.10.  Q.E.D. 

7.3  The  Distribution  of  the  Distance  from  a  Random  Matrix  to  a  Matrix  with 
a  given  Jordan  Canonical  Fbrm 

Let  P*  denote  the  set  of  n  by  n  matrices  with  Jordan  canonical  form 
given  by  the  multiindex  J  (p£  denotes  the  set  of  complex  matrices,  and  Pi 
the  real  matrices).  J  denotes  the  Jordan  form  with  (generically)  m  distinct 
eigenvalues  A*  (lsism),  such  that  A*  has  bk  Jordan  blocks  of  sizes 
sf  •  •  zs*4 .  We  say  generic  because  within  P*  lie  lower  dimensional 

surfaces  where  distinct  eigenvalues  A*  and  A t  become  equal,  or  where  the 
number  of  Jordan  blocks  for  a  given  eigenvalue  increase.  This  is  analogous  to 
the  situation  in  the  last  section,  where  the  variety  of  matrices  of  rank  at 
most  r  has  the  matrices  of  rank  exactly  r  as  a  dense  open  subset  whose 
complement  (matrices  of  rank  less  than  r)  form  a  lower  dimensional  sub- 
variety.  (These  statements  about  P*  requi’e  proof;  we  do  not  even  know  yet 
that  P*  is  a  variety.  These  facts  will  be  proven  below.) 

In  this  section  we  answer  the  following  question:  if  the  matrix  M  is 
chosen  at  random  so  that  M/  ||Af|lx  is  uniformly  distributed  on  the  unit 
sphere,  what  is  the  probability  distribution  of  the  relative  distance  from  M  to 
P*?  The  simplest  case  occurs  when  P*  is  the  variety  of  matrices  with  at  least 
one  double  eigenvalue;  in  this  case  our  result  is: 
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Theorem  7.4:  Let  Me  be  a  complex  n  by  n  matrix  chosen  randomly  as 
described  above,  and  let  Pc  denote  the  set  of  complex  matrices  with  at  least 
one  double  eigenvalue.  Then 

Prob(distj(J/c /  ||  Afc|j  g  ,  Pei  =  n(n-l)8(n  +  l)  e2  +  o(c2)  .  (7.8) 

If  Mg.  is  a  random  real  matrix,  and  Pg  the  set  of  real  matrices  with  at  least 

one  double  eigenvalue,  then 

Prob(dist  g(Mg/  H  Mg  |)  g.Pj*  S&T.fffe.t1). .  e  +  o(e)  .  (7.9) 

For  matrices  of  a  general  Jordan  form  J  we  need  to  know  the  degree  and 
dimension  of  P£  and  P^.  Given  this  data,  the  results  will  follow  from 
Theorems  6.3  and  6.5  of  the  last  chapter.  We  compute  dim(PI)  explicitly 
below.  We  outline  a  procedure  for  computing  a  defining  set  of  polynomials 
{pi  j  for  P*.  thus  proving  that  P*  is  a  variety  and  providing  an  upper  bound  on 
deg (P*)  by  B6zout's  theorem  (Theorem  6.5).  We  will  not  display  { or  com¬ 
pute  this  upper  bound,  however,  because  they  are  very  messy  and  do  not 
illuminate  the  structure  of  Pi  nearly  as  much  as  its  dimension,  which  we  do 
compute  explicitly. 

Theorem  7.5:  Let  J  and  P*  be  defined  as  above.  Then  the  codimension  of  P* 
(and  the  exponent  of  e  in  Theorems  6.3  and  6.5)  is 

codim  (Pi)  =  codi™(-PD  =  n  -m  +  2  §  ^  (i-*1)  (7.10) 

“  tsl  i«2 

where  the  sum  from  i=2  to  bk  is  zero  if  6k  =  l.  If  all  the  bk  =  l,  so  that  there  is 
one  Jordan  block  per  eigenvalue,  this  simplifies  to 

ccdim (n>,!”dyi->.n-m  ■  (T.8) 

Thus,  if  there  is  only  one  Jordan  block  per  eigenvalue,  the  codimension 
depends  only  on  the  number  of  distinct  eigenvalues  (matrices  with  one  Jor¬ 
dan  block  per  eigenvalue  are  called  nondrrogatory). 
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Proof  of  Theorem  7.4:  The  proof  for  Pc  will  follow  immediately  from  Theorem 
6.3  if  we  show  that  Pc  is  a  pure  2n*-2-dimensional  complex  homogeneous 
variety  of  degree  n(n-l).  This  will  follow  in  turn  from  Theorem  6.4  if  we  show 
Pc  is  defined  by  a  single  irreducible  homogeneous  polynomial  p  of  order 
n(n-l).  This  polynomial  is  called  the  resultant  of  the  characteristic  polyno¬ 
mial  of  the  matrix  U  and  its  first  derivative,  whose  properties  we  record  in 
the  following  lemma: 

7.7:  Let  M  be  a  matrix  of  n2  indeterminates  m^.  Let 
p(X.mw)  =  det(JI i-\  l)  be  its  characteristic  polynomial.  Then 

r(m^)  *  res (p(X.my)  .  .  X) 

i.e.  the  resultant  of  p  and  its  derivative  is  a  polynomial  in  the  which  is 

1)  zero  if  and  only  if  Jli  has  a  multiple  eigenvalue, 

2)  homogeneous  of  degree  n(n-l),  and 

3)  irreducible. 

Proof:  see  Section  7.5. 

The  result  for  P*  follows  from  Theorems  6.6  and  6.10.  This  completes 
the  proof  of  Theorem  7.4.  Q.E.D.  of  Theorem  7.4. 

Proof  of  Theorem  7.5:  This  theorem  was  originally  proven  in  [Arnold]  and 
discovered  independently  by  us.  Since  the  proof  is  short  and  provides  and 
interesting  application  of  Lie  groups  to  numerical  linear  algebra,  we  sketch  it 
here. 

Let  U  be  a  matrix  with  Jordan  form  J.  Frobenius’s  theorem 
[Cantmacher]  characterizes  all  matrices  which  commute  with  if,  and  shows 
in  particular  that  they  form  a  linear  manifold  of  dimension 

«<J)  «  £  5  (2i— 1)  s? 

**1  i>l 

for  real  matrices,  and  2  s  (J)  for  complex  matrices.  Now  consider  the  Lie 
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group  GL(n,C)  of  nonsingular  n  by  n  complex  matrices,  which  has  dimension 
2n*.  Let  Zy  denote  the  centralizer  of  li  in  GL(n,C),  that  is,  the  set  of  all  non¬ 
singular  matrices  which  commute  with  M-  We  know  dim(Zjf)  by  the  result 
just  stated.  It  is  easy  to  see  that  Zy  is  a  closed  subgroup  of  GL(n.C).  so  that 
the  quotient  space  GL(n.C)/  Zy  is  a  manifold  of  dimension  2  (nz  -  z(J)) 
[Sternberg].  This  quotient  space  is  naturally  difleomorphic  to  the  set  Sy  of 
matrices  which  are  similar  to  hi,  since  ZjAfZf1  =  Z^MZi1  if  and  only  if  Zx 
and  Za  are  in  the  same  coset  of  Zy.  Now  we  claim  SyxCF1  (the  Cartesian  pro¬ 
duct),  which  has  dimension  2  (m  +  n*  —  z  (J)),  is  locally  difleomorphic  to 
Pi  The  difleomorphism  simply  takes  the  j-th  component  of  the  C"  factor 
and  adds  it  to  the  m-th  distinct  eigenvalue  of  Sy.  Subtracting  dim(5ifxCm) 
from  2n8  yields  the  value  of  codim(Pi)  claimed  in  the  theorem. 

The  same  proof  works  in  the  real  case,  yielding  something  of  exactly  half 
the  codimension  of  P£.  We  do  need  two  additional  facts:  if  two  matrices  over 
a  field  F  are  similar  over  an  extension  field  K  of  F,  then  they  are  similar  over 
F  (because  a  matrix  is  similar  to  its  rational  canonical  form  over  F),  and  two 
complex  conjugate  eigenvalues  of  a  real  n  by  n  matrix  are  determined  by 
two  real  parameters,  so  SjfXC"1*  can  be  replaced  by  SyxW*  above.  Q.E.D.  of 
Theorem  7.5. 

It  remains  to  show  how  to  construct  a  set  of  polynomials  \pi  {  which 
determine  P*.  The  construction  has  two  steps.  First,  we  construct  a  set  of 
polynomials  in  the  matrix  entries  and  the  eigenvalues  \  whose  projec¬ 
tion  onto  the  my  coordinates  is  P*.  Second,  we  show  how  to  eliminate  the  \ 
variables.  This  elimination  requires  a  generalization  of  the  fundamental 
theorem  on  symmetric  polynomials  to  symmetric  varieties. 
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A  polynomial  p(At . A*)  is  called  symmetric  if  for  all  permutations  c 

of  n  objects  we  have 

Prfci . . **<n))=p(A, . A»)  . 

The  fundamental  theorem  on  symmetric  polynomials  [VanderWaerden]  says 

that  p(Ai,  ....  A*)  is  symmetric  if  and  only  if  it  can  be  expressed  as  a  poly¬ 
nomial  in  the  elementary  symmetric  functions  Et  of  the  At: 

Si  «  £  At 

Wan 


Ss*  2 

bti  <j*n 


At  •  A f 


Sa  =  2  At  A^  A* 


Sn  *  n  At 
Man 

A  variety  V  which  is  generated  by  fcpa(A, . A*)}  is  called  symmetric  if  for 

all  permutations  a  V  is  also  generated  by  |pa„j.  Geometrically,  this  means  V 
is  invariant  under  the  group  generated  by  all  reflections  in  planes  Now 

we  can  state  our  generalization  of  the  fundamental  theorem  on  symmetric 
polynomials  to  symmetric  varieties: 

I^mma  7.7:  Let  the  variety  V  be  generated  by  \pa{ A, . A*)}.  Then  V  is 

symmetric  if  and  only  if  V  is  generated  by  a  set  of  polynomials 

. £*)i  in  the  elementary  symmetric  functions  of  the 

Proof:  See  section  7.5. 

For  our  construction  of  Jpjj  we  also  need  to  know  that  the  union  and 
intersection  of  varieties  are  varieties.  For  if  |p*j  generates  P  and  \qt  j  gen¬ 
erates  Q,  then  (p*  ,  qj\  generates  PC\Q>  and  ip*  generates  P\jQ. 

Now  we  begin  the  construction.  Let  m*  *  ^  sf  denote  the  algebraic 

«>i 
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multiplicity  of  the  fc-th  distinct  eigenvalue.  Now  take  the  eigenvalues  and 
constrain  them  with  the  following  polynomials: 

Xj  =  Xg  .  =  Xg . Xj  =  Xmj  , 

Xiaj^l  =  Xnij^g  .  =  Xfl^+3 . Xj^j+i  =  Xmtfm| . 

X  4  =  X  . i  .  X,  =  Xj  *  Xjl  . Xy+i  ,  — 

2  mj*l  2  m,+Z  2  "H  +  t  2  "H+S  2  "S*l  2  "*i 

(-1  i*i  <>I  <»i  Oi  «■» 

In  other  words,  equate  the  first  ml  eigenvalues  to  X,,  the  next  m2  to  X^,^,. 
and  so  on.  Next  take  the  polynomials  det(Af-Xj  /)= 0,  where  j  takes  k  values 
corresponding  to  distinct  Xy.  This  last  set  of  equations  guarantees  that  each 
Xj  is  an  eigenvalue  of  M.  The  constraints  on  the  sizes  of  the  Jordan  blocks 
can  be  translated  into  constraints  on  the  rank  of  powers  of  ■  /,  because 

rank(Af-Xy/)1_1  -  rank(Xf-Xj  7)‘  equals  the  number  of  Jordan  blocks  of  size 
at  least  i,  which  we  denote  B{.  Thus, 

rank (jf-X,  /)*  =  n  -  £  Bl  . 

k«l 

Clearly,  the  si  uniquely  determine  the  Bl,  and  vice-versa.  Also,  any  collaps¬ 
ing  of  X/s  or  breaking  up  of  Jordan  blocks  (which  occur  on  subvarieties)  can 
only  make  rankCtf-X^)*  drop.  From  Section  7.2  we  know  how  to  express  the 
condition  that  rank(j/-X*  7)1  should  be  no  more  than  some  constant  in 
terms  of  determinants  of  minors.  All  these  polynomials  taken  together,  for 
all  m  distinct  eigenvalues  X,-  and  powers  i,  determine  a  variety  in  C**xCm 
space  whose  projection  onto  the  C*  component  is  P1. 

However,  there  are  many  other  varieties  whose  projection  is  P*.  If 
lPa(X*  .  m^)}  generates  the  variety  of  the  last  paragraph,  and  if  a  is  a  per¬ 
mutation  of  the  first  n  integers,  then 


106 


V c  *  •  "V)l  a  •  "SfH 

also  has  the  same  projection.  Consider  the  variety  V  a  \jVa,  and  let 

• 

{g*(A*  .  b®  a  finite  set  of  polynomials  generating  V  (we  know  how  to 
construct  the  qp  from  the  pa  and  the  rule  for  taking  unions  of  varieties).  It  is 
clear  from  the  construction  of  V  that  the  ideal  generated  by  )J  is 

symmetric,  that  is  \qp(\q(t) .  my)j  generates  the  same  ideal  (and  variety  K) 
for  any  permutation  a.  By  Lemma  7.7,  we  see  that  there  is  a  set  of  polynomi¬ 
als  \ry\  which  also  generate  V  but  which  are  functions  of  and  the  elemen¬ 
tary  symmetric  functions  2*  of  the  A*.  But  these  E*  are  nothing  but  the 
coefficients  of  the  characteristic  polynomial  in  U,  which  are  polynomials  in 
rriq.  Thus,  the  ry  themselves  are  polynomials  only  in  the  m^.  These  ry  are 
the  desired  polynomials  which  generate  P*. 

7.4  The  Distribution  at  the  Distance  from  a  Random  Polynomial  to  One  With 
a  Given  Zero  Structure 

Let  Z*  denote  the  set  of  n-th  degree  polynomials  p  (z )  =  £  pi  a*  with 

t»i 

zero  structure  given  by  the  multiindex  K  (Zf  denotes  the  complex  matrices, 
and  Zf  the  real  matrices).  K  denotes  the  zero  structure  with  (generically)  m 
distinct  zeroes  A*  (Kkam),  such  that  A*  has  multiplicity  m*.  We  say  generic 
for  the  same  reason  as  in  the  last  section:  within  Z *  lie  lower  dimensional 
surfaces  within  which  distinct  roots  coalesce.  This  will  be  proven  below. 

In  this  section  we  answer  the  question:  if  the  n-th  degree  polynomial  p 
is  chosen  at  random  so  that  p  /  lip  ||  g  is  "uniformly  distributed"  on  the  unit 
sphere,  shat  is  the  probability  distribution  of  the  relative  distance  from  li  to 

2®?  (||p || g  is  the  norm  (]?J  ip*l*),/8.)  The  reason  for  the  quotation  marks 

i*0 

around  "uniformly  distributed"  is  our  insistence  on  choosing  an  n-tb  degree 
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polynomial  at  random,  which  means  that  we  eliminate  the  hyperplane  p*  =0 
from  our  sample  space.  This  makes  sense  because  ifpn=0we  have  a  qualita¬ 
tively  different  problem  because  we  have  a  polynomial  of  different  degree. 
Smale  [Smale]  considers  the  nonhomogene ous  problem  (p*  =  l).  but  by  main¬ 
taining  homogeneity  we  can  still  the  results  of  chapter  6. 

The  simplest  case  occurs  when  Z*  is  the  variety  of  polynomials  with  at 
least  one  multiple  root;  in  this  case  our  result  is: 

Theorem  7.B:  Let  pc  be  a  complex  degree  n  polynomial  chosen  randomly  as 
described  above,  and  let  Zc  denote  the  set  of  complex  matrices  with  at  least 
one  double  zero.  Then 

Prob(dist^(pc  /  II  Pell  B  •  Zc)  =  n(n2-n-l)  e2  +  o(ez)  .  (7.11) 

If  Pb  is  a  random  real  polynomial,  and  Z*  the  set  of  real  polynomials  with  at 

least  one  double  zero,  then 

Prob(distg(pB /  ||pB||  g  ,  Zb)  c  n^n  c  +  o(e)  .  () 

For  polynomials  of  general  zero  structure  Z*  we  need  to  know  the 
degree  and  dimension  of  Zf  and  Z\ f  so  we  can  apply  Theorems  6.3  and  6.5  of 
the  last  chapter.  We  compute  dim(Z*)  explicitly  below,  but  just  as  in  the  last 
section  we  only  outline  a  procedure  for  computing  a  defining  set  of  polynomi- 
als  IpJ}  for  Z*. 

Theorem  7.9:  Let  K  and  Z*  be  defined  as  above.  Then  the  codimension  of  Z* 
(and  the  exponent  of  e  in  Theorems  6.3  and  6.5)  is 

codim =  n  -m 

so  that  the  codimension  depends  only  on  the  number  of  district  eigenvalues. 

This  theorem  is  analogous  to  Theorem  7.5  of  the  last  section  for  non¬ 
derogatory  matrices,  which  is  no  surprise  since  the  rational  canonical  form 
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of  a  nonderogatory  matrix  is  determined  uniquely  by  the  characteristic  poly¬ 
nomial. 

Proof  of  Theorem  7.9:  As  in  the  analogous  Theorem  7.4,  we  use  the  resultant 
r  of  the  polynomial  p  and  its  first  derivative,  which  is  a  homogeneous  polyno¬ 
mial  of  degree  n(n-l).  Unfortunately,  this  polynomial  does  not  have  the 
property  of  being  zero  if  and  only  if  p  has  a  multiple  root,  because  it  is  also 
zero  if />*  =0,  the  set  of  points  we  eliminated  from  our  sample  space  above.  If 
we  divide  r  by  pn,  we  get  a  polynomial  d  called  the  discriminant  of  p,  which 
is  homogeneous  of  degree  n8— n— 1  and  irreducible  (a  proof  of  irreducibility 
follows  from  the  proof  of  Lemma  7.7  below),  d  will  be  zero  if  and  only  if  p  is  of 
degree  n  and  has  a  multiple  zero  or  pn=pn_1=0,  but  this  last  part  is  a  lower 
dimensional  subvariety  of  the  forbidden  part  of  our  sample  space,  so  does 
not  contribute  to  vol(d).  The  result  follows  from  applying  Theorems  8.3,  8.4, 
8.8  and  8. 10  to  d .  Q.E.D.  of  Theorem  7.9. 

Proof  of  Theorem  7.10:  The  m+1  parameters  \  (1 sism)  and  p„  form  a  local 
coordinate  system  for  the  pj  as  is  easily  seen  by  equating  powers  of  z  in  the 
identity 

£  Pt  **  •  P*  ft  ■ 

<«0  «■» 

Thus.  Zf  has  dimension  2(m+l)  and  codimension  2(n-m)  as  claimed.  The 
real  case  follows  since  complex  zeroes  occur  in  complex  conjugate  pairs,  so 
all  dimensions  and  codimensions  are  cut  in  half.  Q.E.D.  of  Theorem  7.10. 

It  remains  to  show  bow  to  construct  a  set  of  polynomials  Ipf }  defining 
Z *.  The  construction  is  analogous  to  the  construction  in  the  last  section  for 
P1.  First  we  construct  a  set  of  polynomials  in  the  coefficients  p\  and  the 
zeroes  \  which  define  a  symmetric  variety  whose  projection  onto  the  first 
components  is  ZK.  These  polynomials  simply  equate  different  \  and  express 
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the  p1  as  the  usual  symmetric  functions  of  the  X,- .  Second,  we  eliminate  the 
using  Lemma  7.8  just  as  in  the  last  section. 

7.5  Proofs  of  Lemmas  7.7  and  7.8: 

Lemma  7.7:  Let  U  be  a  matrix  of  nz  indeterminates  m^.  Let 
p(Kmij)  =  det(M-\  /)  be  its  characteristic  polynomial.  Then 

rimy)  a  res(p(X.mv)  .  ^-p(X,m^) .  X) 
i.e.  the  resultant  of  p  and  its  derivative  is  a  polynomial  in  the  which  is 

1)  zero  if  and  only  if  U  has  a  multiple  eigenvalue. 

21  homogeneous  of  degree  n(n-l),  and 
3)  irreducible. 

Proof:  The  resultant  of  the  two  polynomials  p(X)  =  2  Pi  **  e  /?[X]  and 

<•0 

g(X)  =  gt  X*  e  /?[X]  is  denoted  res (p,g,X)  (or  res(p,g)  if  X  is  clear  from 
*«o 

context)  and  defined  as  the  determinant 


Po 

Pi 

*  *  * 

Pm 

Po 

Pi 

•  •  • 

Pm 

Po 

Pi 

Pm 

9o 

9i 

9 n 

9o 

9  i 

•  •'  • 

In 

9o 

9i 

9n 

where  there  are  m  copies  of  the  rows  with  g  entries,  and  n  copies  of  the 
rows  with  p  entries.  res(p,g)  is  clearly  a  polynomial  in  the  pt  and  qj.  If 
Pfe^O^g*  then  res(p,g)=0  if  and  only  if  p  and  g  have  a  common  zero  [Van- 
derWaerden].  If  we  choose  g(X)=p’(X)  to  be  the  derivative  of  p  and  pm*0, 
then  res (p,p')  will  be  zero  if  and  only  ifp  has  a  multiple  root  [VanderWaer- 
den).  Applying  this  to  p(X)  =  det(Af-XZ)  proves  claim  1  above. 
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This  choice  of  p  is  clearly  homogeneous  of  degree  n  in  the  A  and  my,  so 
p'  is  homogeneous  of  degree  n— 1.  By  Theorem  6.2  in  [Kendig].  their  resul¬ 
tant  is  homogeneous  of  degree  n(n-l).  This  proves  claim  2  above. 

The  proof  of  claim  3  takes  two  steps.  First  we  show  that  if 

r(A)=An  +  Y]  r*  A*,  then  res (r.r')  is  irreducible.  Second  we  show  that  this 
t-o 

implies  the  irreducibility  of  res (p.p‘). 

To  show  d  =  res  (r.r')  is  irreducible,  we  use  another  representation  of  it 
in  terms  of  the  zeros  a*  of  r:  d  =  J"J  (a<  —  a^)8  [VanderWaerden].  Write  d  as 

the  product  d=dj  ■  dz.  Since  the  are  functions  of  the  r(,  they  are  sym¬ 
metric  functions  of  the  a*.  Since  o^-ag  is  a  factor  of  d,  it  must  divide  either 
dt  or  dz,  say  d,.  By  symmetry,  all  the  other  factors  a<-o^  must  divide  dx,  so 
dx  is  a  constant  multiple  of  d,  and  d  is  irreducible. 

Now  consider  the  companion  matrix  of  r: 

0  1  0  0 

0  0  10 

0  0  0  1 

-»n-l  ~Th-Z  -^n-S  -*0 

r  is  the  characteristic  polynomial  or  this  matrix  [Herstein].  Now  if  res(p.p') 
factored  into  px  p2,  this  would  induce  a  factorization  of  d=res(r,r')=d,d2. 
By  the  result  of  the  last  paragraph,  px  (say),  which  corresponds  to  d)a  will  be 
a  constant  multiple  of  d  so  that  pg  cannot  depend  on  any  entry  m**  of  the 
last  row  of  U ,  but  only  on  entries  m*it+lof  the  super  diagonal.  Now  take  M  and 
exchange  rows  i  and  n  as  well  as  columns  i  and  n  to  obtain  the  similar 
matrix  ft.  ft  has  the  same  characteristic  polynomial  as  If,  so  we  conclude 
that  Ps  cannot  depend  on  entries  from  row  i.  Thus  p2  is  constant  and 
res(p,p')  is  irreducible  as  desired.  Q.E.D.  of  Lemma  7.7. 


Ill 


Lemma  7.8:  Let  the  variety  V  be  generated  by  {pB(Xi . A*)}.  Then  V  is 

symmetric  if  and  only  if  V  is  generated  by  a  set  of  polynomials 
,  Zn)}  in  the  elementary  symmetric  functions  of  the  A*. 

Proof:  The  if  direction  is  trivial.  If  there  is  only  one  pa,  the  symmetry  condi¬ 
tion  implies  pa(\)  =  pa(A«( «))  for  all  permutations  a.  so  the  only  if  direction  is 
equivalent  to  the  fundamental  theorem  on  symmetric  polynomials.  If  there 
is  more  than  one  pa  we  argue  as  follows.  We  let  p „  denote  the  polynomial 
Pa(**(<)X  and  V*  denote  the  variety  generated  by  £pa<r j.  By  assumption  Va=  V 

for  all  a.  Then  the  variety  Vt  -  n  V*  is  generated  by  {pa0  ,  all  a  and  ctJ  which 

* 

equals  n  P*.  where  Va  is  generated  by  .  a  fixed,  all  aj.  If  we  can  show 

a 

the  variety  Va  is  generated  by  a  collection  of  polynomials  over  the  we  will 
be  done.  We  claim  this  collection  of  polynomials  is  the  set  {Et(pa)j  of  all  sym¬ 
metric  function  of  the  p„  themselves.  For  all  the  £* (p„)  can  all  be  zero  if  and 
only  if  all  the  p^  are  zero,  so  they  generate  the  same  variety.  Furthermore, 
each  £i(Pa)  is  clearly  a  symmetric  function  of  the  and  so  by  the  funda¬ 
mental  theorem  on  symmetric  functions,  is  itself  a  function  of  the  Lt.  Q.E.D. 
of  Lemma  7.8. 
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Chapter  8:  Probabilistic  Estimates  of  diss^(<7j  ,  a2) 

8.1  Introduction 

In  this  chapter  we  apply  the  probabilistic  estimates  of  chapters  6  and  7 
to  measure  the  likelihood  that  the  various  bounds  on  dissj(cr2  ,  c2  .  path)  and 
dissf(o1  ,  o2  ,  region)  of  chapter  5  are  accurate. 

In  Section  8.2  we  compute  the  probability  that  a  randomly  chosen 
matrix  is  completely  diagonalizable,  where  our  decomposability  criterion  is 
based  on  path  clustering.  We  also  compute  upper  bounds  on  the  probability 
of  being  able  to  decompose  a  random  matrix  into  blocks  with  no  more  than  r 
eigenvalues  per  block,  where  r>l.  In  Section  8.3  we  estimate  the  decomposi¬ 
tion  probabilities  using  a  different  decomposability  criterion:  a  is  decompos¬ 
able  into  ij  ot  if  !!P<)|  sfi  K  for  all  i  (Ft  is  the  projection  belonging  to  at).  In 
i 

Section  8.4  we  ask  how  much  larger  the  upper  bound  on  diss2( Oj  .  a2  ,  path)  is 
likely  to  be  than  the  lower  bound.  We  also  consider  how  likely  the  lower 
bound  on  dissf  (ot  ,  ae ,  region)  is  to  be  accurate  when  a2  contains  exactly 
one  eigenvalue.  Finally,  in  Section  8.5,  we  make  probabilistic  comparisons  of 
sep  and  sepx,  and  compute  the  probability  distribution  of  sepx.  For  ease  of 
presentation  we  consider  only  complex  matrices  in  this  charier;  probabilis¬ 
tic  statements  for  real  matrices  are  in  all  cases  similar  and  follow  from 
analogous  theorems  for  random  real  matrices  in  chapter  7. 

8.2  The  Probability  of  Being  Able  to  Diagonalize  a  Matrix 

We  recall  our  path  clustering  criterion,  introduced  in  chapter  1:  we  may 
decompose  the  spectrum  a  of  a  matrix  U  into  \J  ai  provided  no  perturbation 

i 

of  Euclidean  norm  e  or  smaller  can  cause  an  eigenvalue  in  some  o*  to 
coalesce  with  an  eigenvalue  from  some  other  .  In  this  section  we  ask  the 
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question:  if  the  matrix  M  is  chosen  at  random  in  the  sense  of  chapters  6  and 

7.  what  is  the  probability  that  U  is  decomposable  into  u  We  may  apply 

i 

the  result  of  section  7.3  to  answer  this  question: 

Theorem  B.l:  Let  U  be  a  random  complex  matrix  and  e>0  a  constant.  Then 
the  probability  that  all  matrices  in  the  set  Mt  of  matrices  within  Euclidean 
distance  t  of  M  are  completely  diagonalizable  is 

1  -  n(n-l)8(n  +  l)  e®  +  o(e2)  . 

Proof:  A  matrix  Si'eMt  is  not  completely  diagonalizable  if  and  only  if  M  is 
within  e  of  a  matrix  with  a  double  eigenvalue.  Now  apply  Theorem  7.4.  Q.E.D. 

This  approach  does  not  extend  to  computing  the  probability  of  being 

able  to  decompose  a  of  a  random  matrix  M  into  \j  oi  where  each  o  contains 

t 

at  most  r  eigenvalues  (rather  than  just  1).  In  other  words,  when  r>l  it  1'  lot 

true  that  a  decomposes  into  U  at,  if  and  on'y  if  M  is  not  within  e  of 

t 

a  matrix  with  an  r+l-tuple  eigenvalue.  This  is  the  point  of  Wilkinson’s  exam¬ 
ple 

1 

2rj  1 
3rj  1 
477 

presented  in  chapter  1:  when  e  is  on  the  order  of  rj°,  o(M)  cannot  be  decom¬ 
posed  at  all  even  though  U  is  not  within  rjz  of  a  matrix  with  a  quadruple 
eigenvalue. 

It  is  true,  however,  that  the  spectrum  of  a  matrix  within  e  of  one  with  an 
r+l-tuple  eigenvalue  is  not  decomposable  o  =  \J  with  #(o<)sr  for  all  i,  so 

i 


we  still  have  an  upper  bound  on  the  probability  of  such  a  decomposition: 
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Theorem  8.2:  Let  Hi  be  a  random  complex  matrix  and  e>0  a  constant.  Then 
the  probability  that  a(M)  is  decomposable  into  Ct  of  size  at  most  r<n  (sub¬ 
ject  to  the  constraint  imposed  by  t  )  is  at  most 

1  - /(n.r)  •  e2r  +  o(c8r)  . 
where  /(n,r)  depends  only  on  n  and  r. 

Proof:  If  an  n  by  n  matrix  has  an  r  +  1-tuple  eigenvalue  it  can  have  at  most 
n-r  distinct  eigenvalues.  Now  apply  Theorem  7.5.  Q.E.D. 

Whether  the  exponent  2r  (or  r  in  the  real  case)  in  this  last  theorem  is 
best  possible  is  an  open  question. 

B.3  The  Probability  of  a  Random  Matrix  haring  all  Projections  of  Small  Norm 

In  this  section  we  also  are  interested  in  the  probability  of  being  able  to 

block  diagonalize  a  random  matrix,  but  now  our  decomposition  criterion  is 

different:  we  may  decompose  the  a  of  a  matrix  U  into  ij  crt  provided 

« 

||  Pill  <K  for  all  i,  where  Pt  is  the  projection  corresponding  to  a*.  We  con¬ 
sider  the  probability  of  completely  diagonalizing  M: 

Theorem  8.3:  Let  M  be  a  random  complex  matrix,  with  one  dimensional  pro¬ 
jections  Pt.  Then 

Prob(||  Pt||  <  A- for  alii)*  1  -  2n(n-l)8(n  +  l)  K~e  +  o(jr*)  . 

Proof :  Since 

Prob(||  Pi  ||  <  K  for  all  <)  =  1  -  Prob(  some  j|  P<  ||  >  K)  , 
it  suffices  to  show  that  2n(n-l)*(n  +  l)ez  +  o(c2)  is  an  upper  bound  on  Prob( 

some  ||  P(||  i  K).  But  by  Lemma  5.1.  ||  Pt  j  |  implies  that  the  relative  dis¬ 
tance  from  U  to  a  matrix  with  a  double  eigenvalue  is  no  more  than 
V2/  (A*-l).  and  the  result  follows  from  Theorem  7.4.  Q.E.D. 


We  can  combine  this  theorem  with  Theorem  3.3  to  compute  the  approxi¬ 
mate  probability  distribution  of  the  condition  number  *(S)  =  ||  5||  •  ||  S~'j| 
of  either  the  best  conditioned  matrix  5  which  diagonalizes  a  random  matrix 
M:  S~lMS  =  diag(A<).  or  the  nearly  best  conditioned  5  computed  in  chapter 
3: 

Theorem  8.4:  Let  M  be  a  random  complex  matrix  and  let  £  be  the  nearly 
best  conditioned  diagonalizing  similarity  ( S~XUS  -  diag(\))  computed  in 
chapter  3.  Then 

Prob(/c (S)  <K)>  1  -  2ns(n— l)2(n  +  l)  K~2  +  o(JTz)  . 

This  inequality  is  also  true  if  S  is  the  best  conditioned  similarity. 

Proof:  By  Theorem  3.3,  k(S )  s  n  ■  max  !!  )! ,  so 

Prob(*(S)  <  K)  as  Prob(max  ||  Pt  ||  <  . 

t  R 

Now  apply  Theorem  8.3.  Since  *(5)  is  an  upper  bound  for  k  of  a  best  condi¬ 
tioned  Sqptiual ■  these  bounds  also  hold  for  k(Scptiiial)-  Q  E.D. 

The  probability  of  decomposing  a  of  Id  into  \j  o*  where  #(aj)s:r  and 

« 

||  Pi  ||  <K.  is  clearly  at  least  this  large,  but  hew  much  larger  we  do  not  know. 

8.4  How  Close  are  the  Upper  and  Lower  Bounds  on  dis$f(ff, ,  oz)? 

In  this  section  we  consider  the  upper  and  lower  bounds  on  diss^ (a, ,  az) 

>/2  sep*C4.f?)  *  dissg(<r, .  ag)  i  (5'2) 

for  general  o,  and  also  in  the  case  where  ff,  contains  just  one  simple  eigen¬ 
value.  (The  notation  is  from  Chapter  5:  we  assume  ||  JIf  || g&  1,  and 
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We  see  immediately  from  Theorem  8.4  that  with  probability  approaching 
1  as  0(K~ z)  that  the  upper  and  lower  bounds  above  will  not  differ  by  more 
than  a  factor  of  K  for  all 

To  ask  about  the  distance  between  the  bounds  for  a  given  o j,  however,  is 
not  a  question  that  lends  itself  to  a  probabilistic  interpretation,  because 
there  is  no  realistic  way  to  probabilistically  model  the  human  choice  of  one 
Oi  or  another.  Our  approach  still  lets  us  measure  the  volume  of  the  set  of 
matrices  for  which  one  bound  is  more  accurate  than  another,  though;  we 
only  must  not  attribute  a  probabilistic  interpretation  to  it. 


We  consider  the  special  case  where  ax  contains  a  single  eigenvalue  of 
multiplicity  one.  In  particular,  we  claim  that  for  most  matrices  the  lower 
bound  in  5.1  above  is  more  accurate  than  the  upper  bound.  This  claim  is 
justified  by  considering  the  second  example  of  section  4.4: 


M  = 


for  which  we  showed  the  lower  bound  in  5.1  is  accurate  to  within  a  small  fac¬ 
tor.  For  a  general  matrix,  B  will  not  be  diagonal  as  in  the  example  but  by 
Theorem  8.4  it  will  be  diagonalizable  by  a  similarity  transformation  whose 
condition  number  exceeds  K  on  a  set  of  matrices  whose  volume  goes  to  zero 
as  K  goes  to  infinity.  Thus,  by  Lemma  2.1,  diss*(<yj  ,  oz)  will  not  exceed  its 
lower  bound  by  more  than  a  factor  of  K  where  K  is  only  large  on  a  small  set 
of  matrices:  those  where  B  has  a  Jordan  block  of  size  at  least  2  with  eigen¬ 
value  a.  In  particular,  as  long  as  U  is  not  close  to  having  a  triple  eigenvalue 
at  a  (such  matrices  having  codimension  4  in  the  complex  case  by  Theorem 
7.5),  then  the  lower  bound  will  be  nearly  accurate. 
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ones. 

To  prove  these  claims,  we  need  to  introduce  yet  another  nested  family  of 

pejorative  varieties.  Consider  the  Euclidean  space  tf'***’1**,  where  the  first 
nA2  coordinates  represent  A  and  the  other  ng2  coordinates  represent  B  in 

the  obvious  way.  Thus,  we  may  refer  to  a  point  in  c*-4**"®*  by  its  coordinates 
( A,B ).  We  will  need  two  norms  on  this  space,  the  usual  Euclidean  one 

I!  U.*)ll i-  *  VpmTTTsiII 

and 

HU, 5)1!  a  max (||  A|| ,  .  \\B\\g)  . 

Clearly 

IIU.-B)!!  <\\{A.B)\\s<y/2\\{A.B)\\  .  (8.2) 

As  usual  we  are  primarily  interested  in  what  happens  on  the  unit  sphere 

\(A.B)  :  (|  (A.B) (!  g  *  1{.  Theorem  5.2  is  valid  on  (and  inside)  this  set,  and  so 
implies  that  sep  and  sepA  must  approach  zero  simultaneously.  This  motivates 
investigating  the  set  P  -  [(A.B)  :  sep (A,B)  =  sepxU.fi)  =  OJ.  Not  surpris¬ 
ingly,  P  is  a  homogeneous  variety;  in  fact  it  is  the  zero  set  of  the  single 
irreducible  order  nA  ■  ng  polynomial  detOi^jj)  =  ±det('kgA).  What  is  the 
shortest  distance  from  a  given  {A ,B)  to  PI  In  analogy  to  the  notation  of 
chapters  8  and  7,  we  denote  the  minimum  distance  dist g((A,B),P)  if  we  use 
the  ||  •  ||  g  norm,  and  dist8(U. fi).P)  if  we  use  the  ||  •  ||  norm.  It  is  immedi¬ 
ate  from  the  definition  of  sepxU.fi)  that  sepxU.fi)  is  precisely  the  distance 
from  (A.B)  to  P  in  the  ||  •  ||  norm;  we  record  this  fact  as 

Lemma  8.5: 

■epxU.fi)  =  distj(U.fi) ,  P) 

This  immediately  suggests  applying  Theorems  8.3  and  6.4  to  P  to  prove 
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Theorem  8.8:  Let  ( 4,2? )  be  chosen  at  random  so  that  (A.B)/  \\  (A,B)\\  £  is 
uniformly  distributed  on  the  unit  sphere  in  Cf4,+n**.  Then 

<«, w-tK«.  **  *  •<«*>*  probtsepxc  .  TrrrS>irr  > s  e) 

ss  2(ni4e+ns®-l)nJ,nB  tz  +  o(ez) 

Proof:  The  proof  is  a  by  now  routine  application  of  Theorems  6.3  and  6.4, 
combined  with  Lemma  8.6  and  inequality  (B.2).  Q.E.D. 

Note  that  it  is  possible  to  randomly  choose  a  pair  (.4,2?)  not  only  so  that 
( A.B )/  ||  (4.2?) ||  b  is  uniformly  distributed  on  the  sphere  as  required  by  the 
theorem  but  also  so  that  A  and  B  are  statistically  independent  (e.g.  let  each 
entry  of  A  and  B  be  an  independent  Gaussian  random  variable  with  mean  0 
and  variance  1).  We  do  not  know  if  this  method  of  choosing  A  and  B  is  a  real¬ 
istic  model  of  the  distributions  induced  by  choosing  the  original  T  matrix  at 
random,  but  we  will  use  this  method  for  the  rest  of  this  chapter  anyway. 

We  begin  by  showing  that  neither  sep  nor  sepx  are  likely  to  be 
significantly  smaller  than  their  trivial  upper  bounds  in  (8.2).  From  Lemma 
2.8  we  have 

min  1^(4)  -  X*(2?)| 

ndn  I X*(4 )  -  \j(B)\  *  »ep(4,2?)>  - 

where  S4  is  a  (best  conditioned)  diagonalizing  similarity  for  4  and  Sg  simi¬ 
larly  diagonalizes  B.  Since  in  our  model  4  and  B  are  chosen  independently, 
we  can  use  Theorem  8.4  to  estimate  the  distribution  of  c(5^)  tc(Sg).  the 
ratio  of  the  upper  to  lower  bounds  in  the  last  inequality.  A  little  manipulation 
shows  that  the  probability  of  this  ratio  exceeding  K  is  0(K~}). 

The  ratio  of  the  upper  to  lower  bounds  on  sepA  in  Lemma  2.15 
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nun  |Xi(A)  -  Xj(i?)|  min \K(A) -X,(5)| 

2  sepxtA.S)  a=  g  .max  (k(Sa)  .  *(S*)) 

is 

max  WS*)  .  *(S*»  . 

An  application  of  Theorem  8.4  shows  that  this  last  quantity  exceeds  K  with 
probability  0(K~Z). 

The  ratio  of  the  upper  bound  on  sepA  to  the  lower  bound  on  sep,  a  crude 
upper  bound  on  sep*/  sep,  is 

k(Sa)  ■  k(Sb) 

2 

which  from  the  above  discussion  exceeds  K  with  probability  0(K~l). 

A  more  detailed  way  to  bound  sep^/sep  is  as  follows:  we  identify  a 
variety  V  with  the  property  that  for  pairs  ( A.B )  sufficiently  far  from  V. 
sep X(A.B)/  sep (A.B)  will  be  bounded  above  by  a  constant  depending  on  the 
distance  from  V.  From  previous  discussion  we  know  V  has  to  lie  within  the  set 
P  where  sep=sep)k=0.  On  the  set  P  we  know  A  and  B  have  to  have  at  least  one 
eigenvalue  X  in  common.  By  lemmas  2.8  and  2.7  (for  sep)  and  lemmas  2.13 
and  2.14  (for  sepjJ  we  know  it  suffices  to  look  at  the  parts  of  A  and  B  with 
common  eigenvalue  X.  By  lemma  2. 12  we  know  that  as  long  as  X  is  a  simple 
eigenvalue  of  either  A  or  B,  then  sep  and  sep*  can  not  differ  too  much.  Thus, 
V  must  consist  of  those  pairs  (A.B)  where  A  and  B  have  a  common  eigen¬ 
value  X.  and  where  both  A  and  B  have  X  as  a  multiple  eigenvalue.  This  is  a 
total  of  3  independent  constraints,  meaning  V  has  codimension  3  in  the  real 
case  and  8  in  the  complex.  As  long  as  the  pair  (A.B)  does  not  fall  into  a  small 
neighborhood  of  this  set  V  of  high  codimension,  sepA/  sep  will  not  be  too  big. 
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Chapter  9:  Relevance  of  the  Probabilistic  Model  to  Finite  Precision  Calcula¬ 
tions 

9.1  Introduction 

What  relevance  does  the  probabilistic  model  of  the  last  three  chapters 
have  to  actual  finite  precision  calculations?  We  will  see  that  the  model  can 
predict  certain  behaviors  of  algorithms  designed  to  solve  the  problems  of 
chapter  7  (matrix  inversion,  eigendecompositions,  polynomial  zero  finding). 
The  tool  required  to  analyze  these  algorithms  is  backwards  error  analysis; 
using  it  one  can  show  that  unless  the  problem  to  be  solved  is  too  close  to 
some  set  P  of  ill-posed  problems,  a  "backwards  stable  algorithm"  will  com¬ 
pute  an  accurate  answer.  For  example,  engineers  have  a  rule  of  thumb  that 
"to  get  an  answer  to  a  certain  precision  (say  3  decimal  places)  it  suffices  to 
do  the  intermediate  calculations  to  about  twice  that  precision  (6  decimal 
places)*'  [Kahan2].  It  will  turn  out  that  the  model  predicts  this  behavior  by 
the  measuring  the  relative  rarity  of  matrices  with  triple  eigenvalues  (or  poly¬ 
nomials  with  triple  zeros)  compared  to  matrices  with  double  eigenvalues  (or 
polynomials  with  double  zeros). 

The  explanatory  power  of  the  model  is  limited  by  the  underlying  proba¬ 
bility  distribution  of  problems  it  assumes:  U/\\Ii\\g  should  be  uniformly 
distributed,  where  U  is  a  random  problem.  Certain  classes  of  problems  sim¬ 
ply  do  not  generate  this  distribution.  For  example,  we  will  see  later  that 
using  Rayleigh  quotient  iteration  to  compute  eigenvectors  of  a  symmetric 
matrix  requires  solving  a  sequence  of  more  and  more  nearly  singular  sys¬ 
tems  of  linear  equations.  In  fact,  the  more  nearly  singular  the  system,  the 
better  the  resulting  answer.  It  is  cleanly  nonsense  to  model  the  set  of 
matrices  being  inverted  as  coming  from  a  uniform  distribution.  Other  classes 
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of  problems  with  predilections  for  producing  nearly  singular  matrices  are 
least  squares  problems  (when  solved  using  the  normal  equations)  and  finite 
difference  schemes  for  differential  equations.  The  most  significant  limitation 
of  the  model  is  that  it  uses  a  continuous  distribution  of  problems;  all  points 
on  the  unit  sphere  are  in  a  certain  sense  equally  likely.  In  actual  computa¬ 
tions,  however,  the  set  of  possible  problems  is  discrete  and  finite.  There  are 
only  a  (huge)  finite  number  of  finite  precision  numbers  representable  in  a 
computer,  and  hence  only  a  finite  number  of  finite  precision  matrices,  poly¬ 
nomials,  etc.  It  will  turn  out  that  this  discreteness  leads  to  qualitatively 
different  behavior  of  algorithms  than  is  predicted  by  the  model.  The  continu¬ 
ous  model  makes  sense  only  so  long  as  the  finite  precision  numbers  are 
dense  enough  to  resemble  the  continuum.  In  Figure  9.1.  for  example,  the 
volume  of  the  set  of  points  within  distance  4e  of  the  curve  P  is  a  good 
approximation  to  the  number  of  dots  (finite  precision  points)  within  distance 
4e  of  P.  This  is  true  because  the  radius  of  the  neighborhood  of  P  (4 e)  is  large 
compared  to  the  spacing  between  finite  precision  points  (e).  In  Figure  9.2,  on 
the  other  hand,  the  volume  of  points  within  distance  c/4  of  P  is  not  neces¬ 
sarily  a  good  approximation  of  the  number  of  dots  within  c/4  of  P.  Thus, 
when  the  radius  of  the  neighborhood  of  P  get  smaller  than  the  interdot  dis¬ 
tance  e.  the  model  breaks  down.  The  breakdown  of  the  model  is  critical  if  one 
is  trying  to  analyze  the  behavior  of  real  algorithms  running  in  finite  precision 
arithmetic.  For  example,  we  will  see  that  one  can  measure  the  difficulty  of 
inverting  a  matrix  with  the  condition  number  x'(^f)=jj  M] ■  )|  A#"1 |J .  When 
using  an  iterative  algorithm  to  compute  the  inverse,  the  number  of  iterations 
needed,  if  very  large,  is  roughly  proportional  to  k!(M)  for  many  algorithms. 
Thus,  we  could  ask  what  (according  to  the  model)  is  the  average  number  of 
iterations  needed  to  invert  a  random  matrix?  This  is  roughly  proportional  to 
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the  average  condition  number.  In  the  case  of  real  matrices,  it  will  turn  out 
that  the  model  predicts  an  infinite  average  condition  number.  In  fact,  the 
model  predicts  that  the  average  condition  number  of  matrices  whose  condi¬ 
tion  numbers  are  restricted  to  be  greater  than  K  is  infinite  for  any  positive 
K.  This  is  because  the  integral  expressing  the  average  condition  number 
looks  like 

I  §-  (»■« 

K  x 

which  is  infinite  for  all  positive  K.  However,  since  there  are  only  finitely  many 
finite  precision  matrices,  the  average  condition  number  of  those  that  are  not 
exactly  singular  must  be  finite.  Thus,  the  model  does  not  supply  us  any  useful 
information  in  this  case. 

What  if  we  could  compute  the  actual  probability  distribution  of  the 
number  of  points  within  distance  e  of  the  variety  P  of  ill-posed  problems  for 
the  finite  precision  case?  It  would  tell  us  how  many  single  precision  problems 
we  could  solve  as  a  function  of  the  extra  precision  used  in  intermediate  cal¬ 
culations.  For  example,  in  the  case  of  inverting  real  matrices,  if  the  actual 
probability  distribution  were  roughly  linear  as  in  the  continuous  case,  then 
each  bit  of  extra  precision  used  would  allow  us  to  solve  half  the  problems  we 
couldn't  solve  before.  We  present  some  simulations  to  substantiate  this  claim 
below.  Clearly,  such  information  would  be  of  great  use  in  the  design  of 
numerical  algorithms  or  even  computer  arithmetic  units,  because  it  would 
tell  the  designer  how  to  trade  off  the  cost  of  arithmetic  (which  is  an  increas¬ 
ing  function  of  the  number  of  bits  of  precision)  with  the  number  of  problems 
the  system  can  solve. 

The  rest  of  this  chapter  is  organized  as  follows.  Section  9.2  shows  how 
backwards  error  analysis  makes  the  probability  model  relevant  to  finite 


precision  calculations.  Section  9.3  discusses  the  limitations  of  the  model 
mentioned  above.  Finally,  section  9.4  demonstrates  the  usefulness  of  knowing 
the  discrete  distribution  of  problems  for  analyzing  the  use  of  extra  precision 
arithmetic. 

9.2  A  Paradigm  for  Analyzing  the  Accuracy  of  Algorithms 

The  paradigm  for  applying  the  probabilistic  model  to  the  analysis  of 
algorithms  is  as  follows: 

(1)  Within  the  space  of  problems,  identify  the  set  P  of  ill-posed  ones. 

(2)  Show  that  the  closer  a  problem  is  to  P,  the  more  sensitive  the  solution 
is  to  small  changes  in  the  problem. 

(3)  Show  that  the  algorithm  in  question  computes  an  accurate  solution  for  a 
problem  close  to  the  one  it  received  as  input  (this  is  known  as  "back¬ 
wards  stability"  [Wilkinsonl]).  Combined  with  the  result  of  (2),  this  will 
show  that  the  algorithm  will  compute  an  accurate  solution  to  a  problem 
so  long  as  the  problem  is  far  enough  from  P. 

(4)  Compute  the  probability  that  a  random  problem  is  close  to  P.  Using 
this  probability  distribution  in  conjunction  with  the  result  of  (3)  we  can 
compute  the  probability  of  the  algorithm  computing  an  accurate  result. 

This  paradigm  is  best  explained  by  applying  it  to  matrix  inversion: 

(1)  The  set  of  matrices  P  which  are  ill-posed  with  respect  to  inversion  are 
precisely  the  singular  matrices. 

(2)  As  discussed  in  section  7.2,  the  condition  number 

*  ]|  Af||  jr  •  U  Af-»j|  (9.2) 

measures  how  difficult  the  matrix  &  is  to  invert.  More  precisely,  it 

measures  how  much  a  relative  perturbation  in  U  can  be  magnified  in 
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U~ 1  [Kahanl]: 


=  Um  sup 


(9.3) 


As  also  discussed  in  section  7.2,  the  condition  number  can  be  expressed 
in  terms  of  the  distance  from  M  to  P: 


k'(M)  =\\M\\b  /  dist*(Af  .  P)  (9.4) 

as  required  by  the  paradigm. 

(3)  Gaussian  elimination  with  partial  pivoting  is  a  standard  algorithm  for 
matrix  inversion  and  is  well  known  to  be  a  backwards  stable  algorithm 
[Wilkinsonl].  Backwards  stability  means  that  when  applying  Gaussian 
elimination  to  compute  the  solution  of  the  system  of  linear  equations 
Ax  =6 ,  one  gets  an  answer  S  which  satisfies  (A+SA)S-b ,  where  6 A  is 
small  in  norm  compared  to  A.  More  exactly,  let  X  be  the  i-th  column  of 
the  approximation  to  computed  using  Gaussian  elimination,  where 
the  arithmetic  operations  performed  (addition,  subtraction,  multiplica¬ 
tion,  and  division)  are  all  rounded  off  to  6  bits  of  precision.  Then  X  is 
the  vcact  value  of  the  i-th  column  of  the  inverse  of  a  matrix 
where  M(i)  is  close  to  M  (the  subscript  i  means  column  i,  and  Af(i)  is 
the  i-th  in  a  sequence  of  n  by  n  matrices).  In  fact 


HM(i)  -U\\s*f(n)  2“*  •  H  #11,  (9.5) 

where  /(n)  is  a  function  only  of  n,  the  dimension  of  M  [Wilkinsonl]. 

This  last  expression  can  be  used  to  bound  the  relative  error  in  the  solu¬ 
tion  Xi  [Wilkinsonl]: 


II*  Kf(fi)  ■  f  (n)  2~* 

ll(Jf-,)ill*  1  -£(U)  /(n)  2-*  ’ 

In  other  words,  as  long  as  the  bound  on  the  distance  from 

not  so  large  that  Jtf(i)  could  be  singular,  i.e.  as  long  as 


(9.6) 
M(i)  to  U  is 
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di3t*(Jf  .P)>f(n)  Z~*  ■  ||#|j,fc  11  Jtf(-i)  -  M\\ g  (9.7) 

or,  substituting  from  equation  (9.4) 

c'(Af )  /  (n)  ■  2~*  <  1  ,  (9.0) 

then  the  relative  error  in  the  computed  inverse  X  is  bounded.  So.  as 

long  the  condition  number  ic'(Ii)  is  smaller  than  2*//  (n),  the  solution 

will  have  some  accuracy,  and  the  smaller  the  more  accurate  the 

solution. 


(4)  Now  we  can  apply  Corollary  7.2  which  gives  the  probability  distribution 
of  the  condition  number  to  estimate  the  probability  that  a  random 
matrix  can  be  inverted  accurately: 


'  IIUr'M,  '  ■/<»>  2- 


-«  e) 


(9.9) 


which,  after  some  rearrangement  (and  assuming  c<l  of  course)  equals 


.Prob  frWyftyrfogy) 

*  1  -„(„8-l)  •  /(n)8  •  (i±^8  Z-“  +  o(/(n)8  (i^8  !T“) 
(assuming  M  is  complex  and  applying  Corollary  7.2).  This  last  inequality 
only  makes  sense  for 


i±£- 

e 

small,  that  is  if  the  precision  2~*  used  in  the  computation  is  much 
smaller  than  the  precision  e  demanded  of  the  answer.  This  restriction 
also  makes  sense  numerically. 


Similar  analyses  are  possible  of  standard  algorithms  to  compute  eigen¬ 
values  and  eigenvectors  as  well  as  zeros  of  polynomials  [Wilkinson  1,  Wilkin- 
son2].  In  the  case  of  eigenvalue  problems,  the  ill-posed  set  P  consists  of 
those  matrices  with  multiple  eigenvalues,  the  higher  the  multiplicity  the 
more  ill-posed  the  problem.  Why  is  this?  It  is  well  known  that  the  eigenvalues 
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of  a  matrix  are  algebraic  functions  of  the  entries.  In  particular,  if  X  is  a  sim¬ 
ple  eigenvalue  of  the  matrix  A.  and  if  B  is  another  matrix,  then  the 
parameterized  matrix  A+eB  will  have  an  eigenvalue  X(s)  such  that  X(0)=X 
and  for  e  in  a  neighborhood  of  0,  X(e)  will  be  expressible  as  a  power  series  in 
e  [Kato2].  Interpreting  zB  as  a  perturbation  in  A  due  to  measurement  or 
roundoff  error,  we  see  that  the  perturbation  X(e)-X  in  v4‘s  eigenvalue  X 
depends  at  worst  linearly  on  e  for  small  e.  Now  what  if  X  is  an  m. -tuple  eigen¬ 
value  of  A ?  Then  it  is  well  known  [Wilkinson2]  that  X(e)  will  generically  be 
expressible  as  a  power  series  in  e1/m  for  small  e,  so  that  a  perturbation  zB  of 
order  e  in  A  results  in  a  perturbation  X(c)— X  of  order  zUm  in  A's  eigenvalue. 
Since  for  small  e  and  m>l,  e1/m  is  much  larger  than  e,  this  means  that 
errors  made  in  the  computation  of  multiple  or  nearly  multiple  eigenvalues 
are  large  compared  to  the  errors  in  a  simple  eigenvalue,  and  the  higher  the 
multiplicity  of  the  eigenvalue,  the  worse  the  error. 

We  can  relate  the  error  made  to  the  multiplicity  of  the  eigenvalue  being 
computed  in  a  more  precise  way.  Almost  all  algorithms  used  to  compute 
eigenvalues  are  backwards  stable  in  the  sense  that  they  compute  the  eigen¬ 
values  of  a  matrix  near  the  one  supplied  as  input.  As  is  the  case  of  matrix 
inversion,  the  distance  from  the  input  matrix  to  the  nearby  one  depends  cn 
the  precision  Z~*  used  in  the  calculations.  Thus,  the  zB  perturbation  of  the 
last  paragraph  is  of  order  2”*.  Therefore,  the  error  in  the  computed  value  of 
an  m-tuple  eigenvalue  will  be  of  order  2“*/m  by  the  argument  of  the  last 
paragraph.  In  other  words,  if  we  do  our  calculations  using  6  bits  of  precision, 
we  can  only  expect  about  b/m  bits  of  precision  in  the  computed  value  of  an 
m-tuple  eigenvalue.  If  m=2,  for  example  (a  double  eigenvalue),  we  expect  to 
lose  half  our  precision.  This  analysis  tells  us  how  much  precision  is  needed  to 
compute  eigenvalues  accurately  to  the  basic  precision  2~*.  Since  we  lose  half 
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the  precision  when  computing  double  eigenvalues,  double  precision  2~**  will 
get  double  eigenvalues  accurate  to  single  precision.  Similarly,  m-tuple  preci¬ 
sion  2~m*  is  need  to  expect  to  compute  m-tuple  eigenvalues  accurately. 

How  likely  are  multiple  eigenvalues  according  to  our  probabilistic 
model?  According  to  theorem  7.5,  the  codimension  of  the  set  of  complex  n 
by  n  matrices  with  at  least  one  m-tuple  eigenvalue  is  2(m-l),  and  so  by 
Theorem  6.3  the  distribution  of  matrices  within  distance  e  of  one  with  an  m- 
tuple  eigenvalue  is  asymptotically  proportional  to  As  m  grows,  the 

exponent  of  e  increases,  and  decreases.  Said  another  way,  for  small 

enough  e,  matrices  with  multiple  (double  or  more)  eigenvalues  are  very  rare 
in  the  set  of  all  matrices,  matrices  with  triple  (or  more)  eigenvalues  are  very 
rare  in  the  set  of  matrices  with  multiple  eigenvalues,  and  so  on. 

Recall  now  the  engineer’s  rule  of  thumb:  "double  precision  in  intermedi¬ 
ate  calculations  is  enough  to  get  the  answer  to  single  precision."  The  model 
can  be  used  to  explain  this  empirical  observation.  Most  eigenvalue  problems 
involve  simple  eigenvalues,  and  for  these  single  precision  suffices  to  compute 
a  satisfactory  answer.  Rarely,  one  has  to  compute  a  nearly  double  eigen¬ 
value,  and  for  these  double  precision  suffices.  Much  more  rarely,  one  needs 
yet  higher  precision,  but  the  occurrence  of  these  triple  and  higher  multiple 
zeros  is  so  rare  that  double  precision  is  almost  always  enough.  A  similar 
analysis  applies  to  computing  multiple  zeros  of  polynomials. 

The  discussion  of  the  last  few  paragraphs  has  been  far  from  rigorous, 
using  asymptotic  results  of  dubious  validity  to  explain  an  empirical  observa¬ 
tion  stated  without  evidence.  Nonetheless,  it  demonstrates  the  power  of  the 
paradigm  stated  at  the  beginning  of  this  section.  In  the  next  section  we  dis¬ 
cuss  when  the  results  of  the  model  are  indeed  inapplicable  and  misleading. 
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We  may  use  the  same  kind  of  paradigm  as  discussed  so  far  to  analyze 
the  speed  of  convergence  of  an  algorithm  rather  than  its  accuracy.  In  this 
case  the  paradigm  is 

(1‘)  Identify  the  ill-posed  problems  P'. 

(2')  Show  that  the  closer  a  problem  is  to  P\  the  slower  the  algorithm  con¬ 
verges. 

(3‘)  Compute  the  probability  that  a  random  problem  is  close  to  P".  Com¬ 
bined  with  (2’)  this  yields  the  probability  distribution  of  the  speed  of 
convergence. 

This  approach  has  been  used  by  Smale  [Smale]  in  his  average  speed 
analysis  of  Newton's  method  for  finding  zeros  of  polynomials. 

9.3  limitations  of  the  Probabilistic  Model 

In  this  section  we  discuss  two  examples  illustrating  the  breakdown  of  the 
model.  Both  examples  show  behavior  of  widely  used  algorithms  which 
disagrees  with  the  predictions  of  the  model  because  of  the  effects  of  finite 
precision  arithmetic.  In  addition,  the  first  example  shows  how  the  assump¬ 
tion  of  uniformity  of  U/  | !  U 1 1  g  breaks  down  even  in  exact  arithmetic. 

The  first  example  is  Rayleigh  quotient  iteration,  which  is  used  to  com¬ 
pute  the  eigenvalues  and  eigenvectors  of  a  symmetric  matrix  A.  If  x0  is  an 
initial  guess  at  an  eigenvector,  the  algorithm  proceeds  as  follows: 

\  =  XiAti  /  x/x* 

*ui  -  {A  —  \)~x  • 

The  idea  is  that  if  x*  is  a  good  approximation  to  an  eigenvector,  then  \+i  (the 
Rayleigh  quotient)  is  a  good  approximation  to  an  eigenvalue,  and  in  turn  zu.1 
is  an  even  better  approximate  eigenvector.  In  fact,  the  asymptotic  conver¬ 
gence  rate  is  cubic  under  some  weak  assumptions  on  the  distribution  of  X's 
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eigenvalues  (i.e.  I  Anjys-Ai+il  is  of  the  order  of  IAjj?^— X*  ] a  when 
I  Afjyyy-Ai  |  is  small  enough  [Parlett]).  Note  that  as  A*  converges  to  an  eigen¬ 
value.  the  more  nearly  singular  the  matrix  A— A*  becomes.  In  exact  arith¬ 
metic,  of  course,  if  xi  is  an  exact  eigenvector,  -d-A*  will  be  exactly  singular. 

The  sequence  of  matrices  to  be  inverted  (actually,  one  solves  the  linear 
systems  (v4-Ai)il+I=ii  directly  rather  than  compute  {A-\i)~x)  becomes 
more  and  more  nearly  singular,  so  that  the  distribution  of  matrices  to  be 
(conceptually)  inverted  is  far  from  uniformly  distributed.  This  invalidates  the 
assumption  of  the  model,  even  in  exact  arithmetic.  How  does  Rayleigh  quo¬ 
tient  iteration  behave  in  finite  precision  arithmetic?  The  discussion  of  sec¬ 
tion  9.1  might  lead  us  to  doubt  that  it  works  at  all,  since  we  showed  there 
that  one  can  not  expect  an  accurate  solution  to  a  nearly  singular  system  of 
linear  equations.  In  fact,  Rayleigh  quotient  iteration  works  extremely  well 
because  the  rounding  errors  committed  in  the  course  of  computing  xt+, 
provably  conspire  to  produce  an  error  lying  almost  certainly  in  the  direction 
of  the  desired  eigenvector.  In  fact,  when  A*  has  almost  converged  to  am 
eigenvalue,  the  rounding  errors  will  swamp  the  computation  so  that 
almost  certainly  becomes  the  desired  eigenvector  and  further  iterations 
serve  only  to  make  small,  random  changes  in  x  without  improving  its  accu¬ 
racy.  In  other  words,  there  is  an  effect  due  to  finite  precision  arithmetic 
which  makes  the  algorithm  converge  very  quickly,  so  the  asymptotically 
cubic  convergence  rate  is  rarely  observed  for  long.  Therefore,  any  average 
speed  analysis  of  Rayleigh  quotient  iteration  which  ignores  the  effects  of 
finite  precision  arithmetic  may  be  misleading.  For  a  further  discussion  of 
Rayleigh  quotient  iteration  see  [Parlett]. 
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For  the  second  example  we  return  to  matrix  inversion.  As  discussed  in 
section  9.1,  the  condition  number  is  a  measure  of  the  anticipated  accuracy 
in  the  computed  inverse  of  a  matrix.  It  can  also  be  used  to  measure  the 
speed  of  convergence  of  many  iterative  algorithms  for  computing  the  inverse 
to  within  a  given  precision  [Wilkinson2].  Therefore,  a  reasonable  question  to 
ask  is  the  following:  what  is  the  expected  condition  number  of  a  random  real 
matrix?  Let  us  see  what  the  model  says.  Even  though  we  only  have  an  asymp¬ 
totic  upper  bound  on  the  distribution  of  k'{M)  of  the  form  (Corollary  7.2): 

Prob(«'(A/)iA’)  £  const  K~l  +  o(JT*) 

it  is  clear  that  for  large  enough  K  the  probability  distribution  function  will 
also  be  bounded  below  by  a  constant  multiple  of  K~l.  This  is  because  within 
the  variety  of  singular  matrices  lies  a  manifold  (perhaps  small)  of  codimen¬ 
sion  1  which  does  have  an  e  spherical  neighborhood  for  sufficiently  small  e 
(sufficiently  large  K)  which  by  Weyl’s  theorem  has  a  volume  given  asymptoti¬ 
cally  by  a  constant  multiple  of  K~x.  Thus,  the  integral  which  expresses  the 
expected  condition  number  will  be  bounded  below  by 

E(k'{U))  as  £  const  •  —~ 

and  this  integral  diverges  to  infinity  no  matter  how  large  Kq  is.  However, 
since  there  are  only  a  finite  number  of  finite  precision  matrices,  there  is 
some  Kq  such  that  no  finite  precision  matrix  that  is  not  exactly  singular  has 
a  condition  number  greater  than  Kq.  Therefore  the  value  of  the  integral  is 
determined  entirely  by  integrating  over  a  range  of  condition  numbers  which 
do  not  correspond  to  any  finite  precision  matrices.  Clearly,  this  model  is  not 
telling  us  anything  useful  in  this  case. 

In  the  case  of  complex  matrices,  the  corresponding  integral  does  con¬ 
verge,  because  for  sufficiently  large  t£  it  is  dominated  by 
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/const  ~r  . 

*0 

wbich  converges.  The  results  are  however  still  not  trustworthy  because  we 
are  integrating  over  the  region  within  distance  2 of  the  variety  P  of  singu¬ 
lar  matrices,  where  2“*  is  the  separation  between  adjacent  finite  precision 
numbers  (see  Figure  9.2),  where  the  model  breaks  down.  In  the  next  section 
we  discuss  what  we  could  do  if  we  could  extend  the  model  to  this  region  close 
toP. 

9.4  How  to  Use  the  Discrete  Distribution  at  Points  Within  Distance  e  of  a 
Variety 

Before  proceeding,  we  need  to  say  what  probability  measure  we  are 
going  to  put  on  the  discrete  set  of  finite  precision  points.  Section  9.3  showed 
that  no  single  distribution  is  good  for  all  applications,  but  a  uniform  distribu¬ 
tion  remains  a  neutral  and  interesting  choice.  Therefore  for  the  sake  of  dis¬ 
cussion  the  probability  we  asrgn  to  the  point  ii  will  be  proportional  to  the 
volume  of  the  small  parallelepiped  of  points  which  round  to  ii  (i.e.  the  paral¬ 
lelepiped  centered  at  Ii  with  sides  equal  in  length  to  the  distance  between 
adjacent  finite  precision  points).  In  the  case  of  fixed  point  arithmetic  [Wil¬ 
kinson  1],  this  means  that  each  point  has  equal  probability,  whereas  with 
floating  point  arithmetic  points  near  0  have  smaller  probability  than  larger 
points,  since  points  near  0  are  closer  together  than  points  farther  away. 
(Actually,  the  question  of  the  distribution  of  the  digits  of  a  floating  point 
number  has  a  large  literature  [Hamming,  Bareiss].  The  discussion  in  this  sec¬ 
tion  does  not  depend  on  the  actual  distribution  of  digits  chosen). 

We  claim  that  knowing  the  probability  distribution  of  the  distance  of  a 
random  finite  precision  problem  to  the  set  P  of  ill-posed  problems  will  tell  us 
how  many  finite  precision  problems  we  can  solve  as  a  function  of  the  extra 
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precision  used  in  intermediate  calculations.  As  mentioned  before,  program¬ 
mers  often  resort  to  extra  precision  arithmetic  to  get  more  accurate  solu¬ 
tions  to  problems  which  are  given  only  to  single  precision.  This  extra  preci¬ 
sion  has  a  cost  (speed)  dependent  on  the  number  of  digits  carried,  so  pro¬ 
grammers  usually  avoid  extra  precision  unless  persuaded  otherwise  by  bad 
experiences,  an  error  analysis,  or  paranoia.  Therefore  an  accurate  estimate 
of  how  many  problems  can  be  solved  as  a  function  of  the  extra  precision  used 
would  not  only  help  programmers  decide  how  much  to  use  but  possibly 
influence  designers  when  they  decide  how  much  precision  to  make  available 
in  their  computer  systems. 

How  does  knowledge  of  this  probability  distribution  tell  us  how  much 
extra  precision  to  use?  The  paradigm  in  section  9.2  tells  us  how.  A  backwards 
stable  algorithm  using  extra  precision  gets  an  accurate  solution  to  a  problem 
in  a  small  ball  around  the  input  problem.  The  radius  of  this  ball  depends  on 
the  extra  precision  used.  Therefore,  we  can  expect  to  accurately  solve  prob¬ 
lems  lying  within  2“*  of  P,  where  2~*  is  the  distance  between  adjacent  finite 
precision  numbers  in  the  input  data,  since  the  small  ball  around  the  input 
problem  will  be  bounded  away  from  the  set  P.  The  probability  distribution 
tells  us  as  before  how  many  problems  lie  within  a  given  distance  of  P,  and  so 
it  tells  us  how  many  problems  we  can  solve  that  we  couldn’t  solve  before. 

This  discussion  has  assumed  so  far  that  the  finite  precision  input  is 
known  exactly,  i.e.  that  there  is  no  error  inherited  from  previous  computa¬ 
tions  or  from  measurement  errors.  In  general  there  will  be  such  errors,  and 
they  will  almost  always  be  at  least  a  few  units  in  the  last  place  of  the  input 
problem.  In  other  words,  there  already  is  a  ball  of  uncertainty  around  the 
input  problem  with  a  radius  equal  to  a  small  multiple  of  the  interpoint  dis- 
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tance  2~V  Therefore,  it  may  make  no  sense  to  use  higher  precision  to  accu¬ 
rately  solve  problems  lying  very  close  to  P  when  the  inherited  input  error  is 
so  large  that  the  true  answer  is  inherently  very  uncertain.  In  such  situations 
programmers  usually  shrug  and  settle  for  the  backwards  stability  provided 
by  the  algorithm,  even  if  the  delivered  solution  is  entirely  wrong,  because 
the  act  of  solution  has  scarcely  worsened  the  uncertainty  inherited  from  the 
data,  and  the  programmer  declines  to  be  held  responsible  for  the  uncer¬ 
tainty  inherent  in  the  data. 

Nevertheless,  we  close  with  an  example  using  the  actual  discrete  distri¬ 
bution.  Consider  the  rather  simple  problem  of  inverting  2  by  2  matrices.  This 
problem  is  small  enough  that  we  can  actually  exhaustively  compute  the 
desired  discrete  probability  distribution  for  low  precision  arithmetic.  We  did 
this  for  3,  4,  5.  6  and  7  bit  fixed  point  arithmetic  (all  numbers  lay  between  0 
and  1  in  absolute  value),  where  each  fixed  point  matrix  was  assigned  the 
same  probability.  In  all  cases,  we  observed  approximately  linear  behavior  of 
the  probability  distribution  (as  predicted  by  the  continuous  model)  both  for 
distances  e  to  the  nearest  singular  matrix  larger  than  2“*  (3£b*»7),  and  for  t 
smaller  than  2~*  (the  fraction  of  problems  within  Z~*  of  a  singular  matrix  was 
about  21-*).  This  linear  behavior  continued  until  e  reached  approximately 
2~**f  and  there  the  graph  of  the  distribution  became  horizontal  and 
remained  so  all  the  way  to  the  origin,  intersecting  the  vertical  axis  at  about 
2*~*.  meaning  that  all  matrices  closer  to  P  than  approximately  2~n  were 
exactly  singular,  and  that  the  fraction  of  matrices  which  were  exactly  singu¬ 
lar  was  2*”** .  See  figure  9.3  for  a  rough  sketch  of  this  observed  probability 
distribution.  What  does  this  tell  us  about  the  use  of  extra  precision?  Basi¬ 
cally,  as  long  as  the  distribution  function  remains  linear,  it  says  that  for 
every  extra  bit  of  intermediate  precision,  we  can  solve  half  the  problems  we 
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couldn’t  solve  before.  This  regime  continues  until  we  reach  double  precision, 
at  which  point  the  only  problems  we  can’t  solve  are  exactly  singular.  Indeed, 
since 


a  c 
b  d 


d  — b 
— c  a 


-  (ad-bc)~l 

we  can  clearly  compute  the  inverse  accurately  if  we  can  compute  the  deter¬ 
minant  ad— be  accurately.  Since  a.  b,  c  and  d  are  given  to  single  precision, 
double  clearly  suffices  to  compute  ad -be  exactly. 


Of  course,  exhaustive  evaluation  of  the  distribution  function  is  not  rea¬ 
sonable  for  large  problems,  and  evaluating  the  distribution  function  becomes 
an  interesting  question  of  Diophantine  approximation. 


Figure  9.3 


Observed  Probability  Distribution  of  the 
Distance  €  to  the  nearest  Singular  Katrlx 
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